Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
dolfinus committed Oct 10, 2023
2 parents 6944c4f + 0bf3ea4 commit 17ed2de
Show file tree
Hide file tree
Showing 86 changed files with 1,124 additions and 1,138 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/data/clickhouse/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/core/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/hdfs/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ min: &min

max: &max
hadoop-version: hadoop3-hdfs
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/hive/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/kafka/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ min: &min

max: &max
kafka-version: 3.5.1
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
12 changes: 9 additions & 3 deletions .github/workflows/data/local-fs/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,18 @@ min_excel: &min_excel
java-version: 8
os: ubuntu-latest

max: &max
max_excel: &max_excel
spark-version: 3.4.1
python-version: '3.11'
java-version: 20
os: ubuntu-latest

max: &max
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest

latest: &latest
spark-version: latest
python-version: '3.11'
Expand All @@ -30,13 +36,13 @@ latest: &latest

matrix:
small:
- <<: *max_excel
- <<: *max
- <<: *min_avro
- <<: *min_excel
full:
- <<: *min
- <<: *min_avro
- <<: *min_excel
- <<: *max_excel
- <<: *max
nightly:
- <<: *min
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/mssql/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/mysql/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/oracle/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/postgres/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/s3/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ min: &min

max: &max
minio-version: 2023.7.18
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/teradata/matrix.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
max: &max
spark-version: 3.4.1
spark-version: 3.5.0
python-version: '3.11'
java-version: 20
os: ubuntu-latest
Expand Down
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: check-ast
- id: check-case-conflict
Expand Down Expand Up @@ -28,7 +28,7 @@ repos:
- id: remove-tabs
exclude: ^docs/(make.bat|Makefile)
- repo: https://github.com/codespell-project/codespell
rev: v2.2.5
rev: v2.2.6
hooks:
- id: codespell
args: [-w]
Expand Down Expand Up @@ -59,7 +59,7 @@ repos:
- id: rst-inline-touching-normal
- id: text-unicode-replacement-char
- repo: https://github.com/asottile/pyupgrade
rev: v3.13.0
rev: v3.15.0
hooks:
- id: pyupgrade
args: [--py37-plus, --keep-runtime-typing]
Expand Down
10 changes: 6 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Non-goals
Requirements
------------
* **Python 3.7 - 3.11**
* PySpark 2.3.x - 3.4.x (depends on used connector)
* PySpark 2.3.x - 3.5.x (depends on used connector)
* Java 8+ (required by Spark, see below)
* Kerberos libs & GCC (required by ``Hive``, ``HDFS`` and ``SparkHDFS`` connectors)

Expand Down Expand Up @@ -96,7 +96,7 @@ Supported storages
+ +--------------+----------------------------------------------------------------------------------------------------------------------+
| | Samba | `pysmb library <https://pypi.org/project/pysmb/>`_ |
+--------------------+--------------+----------------------------------------------------------------------------------------------------------------------+
| Files as DataFrame | SparkLocalFS | Apache Spark `File Data Source <https://spark.apache.org/docs/3.4.1/sql-data-sources-generic-options.html>`_ |
| Files as DataFrame | SparkLocalFS | Apache Spark `File Data Source <https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html>`_ |
| +--------------+ +
| | SparkHDFS | |
| +--------------+----------------------------------------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -179,6 +179,8 @@ Compatibility matrix
+--------------------------------------------------------------+-------------+-------------+-------+
| `3.4.x <https://spark.apache.org/docs/3.4.1/#downloading>`_ | 3.7 - 3.11 | 8u362 - 20 | 2.12 |
+--------------------------------------------------------------+-------------+-------------+-------+
| `3.5.x <https://spark.apache.org/docs/3.5.0/#downloading>`_ | 3.8 - 3.11 | 8u371 - 20 | 2.12 |
+--------------------------------------------------------------+-------------+-------------+-------+

.. _pyspark-install:

Expand All @@ -192,7 +194,7 @@ or install PySpark explicitly:

.. code:: bash
pip install onetl pyspark==3.4.1 # install a specific PySpark version
pip install onetl pyspark==3.5.0 # install a specific PySpark version
or inject PySpark to ``sys.path`` in some other way BEFORE creating a class instance.
**Otherwise connection object cannot be created.**
Expand Down Expand Up @@ -530,7 +532,7 @@ Read files directly from S3 path, convert them to dataframe, transform it and th
setup_logging()
# Initialize new SparkSession with Hadoop AWS libraries and Postgres driver loaded
maven_packages = SparkS3.get_packages(spark_version="3.4.1") + Postgres.get_packages()
maven_packages = SparkS3.get_packages(spark_version="3.5.0") + Postgres.get_packages()
spark = (
SparkSession.builder.appName("spark_app_onetl_demo")
.config("spark.jars.packages", ",".join(maven_packages))
Expand Down
10 changes: 5 additions & 5 deletions docs/changelog/0.9.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
Features
--------

- Add ``if_exists="ignore"`` and ``error`` to ``Hive.WriteOptions`` (:github:pull:`143`)
- Add ``if_exists="ignore"`` and ``error`` to ``JDBC.WriteOptions`` (:github:pull:`144`)
- Add ``if_exists="ignore"`` and ``error`` to ``MongoDB.WriteOptions`` (:github:pull:`145`)
- Add ``Excel`` file format support. (:github:pull:`148`)
- Add ``Samba`` file connection.
It is now possible to download and upload files to Samba shared folders using ``FileDownloader``/``FileUploader``. (:github:pull:`150`)
- Add ``if_exists="ignore"`` and ``error`` to ``Hive.WriteOptions`` (:github:pull:`143`)
- Add ``if_exists="ignore"`` and ``error`` to ``JDBC.WriteOptions`` (:github:pull:`144`)
- Add ``if_exists="ignore"`` and ``error`` to ``MongoDB.WriteOptions`` (:github:pull:`145`)


Improvements
Expand All @@ -21,10 +21,10 @@ Improvements
* Added interaction schemas for reading, writing and executing statements in Greenplum.
* Added recommendations about reading data from views and ``JOIN`` results from Greenplum. (:github:pull:`154`)
- Make ``.fetch`` and ``.execute`` methods of DB connections thread-safe. Each thread works with its own connection. (:github:pull:`156`)
- Call ``.close()`` on FileConnection then it is removed by garbage collector. (:github:pull:`156`)
- Call ``.close()`` on ``FileConnection`` then it is removed by garbage collector. (:github:pull:`156`)


Bug Fixes
---------

- Fix issue while stopping Python interpreter calls ``JDBCMixin.close()`` and prints exceptions to log. (:github:pull:`156`)
- Fix issue when stopping Python interpreter calls ``JDBCMixin.close()``, but it is finished with exceptions. (:github:pull:`156`)
20 changes: 20 additions & 0 deletions docs/changelog/0.9.5.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
0.9.5 (2023-10-10)
==================

Features
--------

- Add ``XML`` file format support. (:github:pull:`163`)
- Tested compatibility with Spark 3.5.0. ``MongoDB`` and ``Excel`` are not supported yet, but other packages do. (:github:pull:`159`)


Improvements
------------

- Add check to all DB and FileDF connections that Spark session is alive. (:github:pull:`164`)


Bug Fixes
---------

- Fix ``Hive.check()`` behavior when Hive Metastore is not available. (:github:pull:`164`)
38 changes: 4 additions & 34 deletions docs/changelog/NEXT_RELEASE.rst
Original file line number Diff line number Diff line change
@@ -1,36 +1,6 @@
.. copy this file with new release name
.. then fill it up using towncrier build
.. and add it to index.rst
.. fill up this file using ``towncrier build``
.. then delete everything up to the header with version number
.. then rename file to ``{VERSION}.rst`` and add it to index.rst
.. and restore ``NEXT_RELEASE.rst`` content`` as it was before running the command above
.. towncrier release notes start
0.9.4 (2023-09-26)
==================

Features
--------

- Add ``if_exists="ignore"`` and ``error`` to ``Hive.WriteOptions`` (:github:pull:`143`)
- Add ``if_exists="ignore"`` and ``error`` to ``JDBC.WriteOptions`` (:github:pull:`144`)
- Add ``if_exists="ignore"`` and ``error`` to ``MongoDB.WriteOptions`` (:github:pull:`145`)
- Add ``Excel`` file format support. (:github:pull:`148`)
- Add ``Samba`` file connection.
It is now possible to download and upload files to Samba shared folders using ``FileDownloader``/``FileUploader``. (:github:pull:`150`)


Improvements
------------

- Add documentation about different ways of passing packages to Spark session. (:github:pull:`151`)
- Drastically improve ``Greenplum`` documentation:
* Added information about network ports, grants, ``pg_hba.conf`` and so on.
* Added interaction schemas for reading, writing and executing statements in Greenplum.
* Added recommendations about reading data from views and ``JOIN`` results from Greenplum. (:github:pull:`154`)
- Make ``.fetch`` and ``.execute`` methods of DB connections thread-safe. Each thread works with its own connection. (:github:pull:`156`)
- Call ``.close()`` on FileConnection then it is removed by garbage collector. (:github:pull:`156`)


Bug Fixes
---------

- Fix issue while stopping Python interpreter calls ``JDBCMixin.close()`` and prints exceptions to log. (:github:pull:`156`)
1 change: 1 addition & 0 deletions docs/changelog/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

DRAFT
NEXT_RELEASE
0.9.5
0.9.4
0.9.3
0.9.2
Expand Down
Loading

0 comments on commit 17ed2de

Please sign in to comment.