Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Spark 3.5.2 #306

Merged
merged 1 commit into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/data/clickhouse/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ min: &min
max: &max
clickhouse-image: clickhouse/clickhouse-server
clickhouse-version: 24.6.3.70-alpine
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/core/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/hdfs/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ min: &min

max: &max
hadoop-version: hadoop3-hdfs
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/hive/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ min: &min
os: ubuntu-latest

max: &max
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/kafka/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ min: &min
max: &max
kafka-version: 3.7.1
pydantic-version: 2
spark-version: 3.5.1
spark-version: 3.5.2
python-version: '3.12'
java-version: 20
os: ubuntu-latest
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/data/local-fs/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ min_excel: &min_excel
os: ubuntu-latest

max: &max
# Excel package currently has no release for 3.5.1
spark-version: 3.5.0
# Excel package currently has no release for 3.5.2
spark-version: 3.5.1
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/mongodb/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ min: &min

max: &max
mongodb-version: 7.0.12
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/mssql/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ min: &min

max: &max
mssql-version: 2022-CU14-ubuntu-22.04
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/mysql/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ min: &min

max: &max
mysql-version: 9.0.1
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/oracle/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ max: &max
oracle-image: gvenzl/oracle-free
oracle-version: 23.4-slim-faststart
db-name: FREEPDB1
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/postgres/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ min: &min

max: &max
postgres-version: 16.3-alpine
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/s3/matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ min: &min

max: &max
minio-version: 2024.7.26
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/data/teradata/matrix.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
max: &max
spark-version: 3.5.1
spark-version: 3.5.2
pydantic-version: 2
python-version: '3.12'
java-version: 20
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Create virtualenv and install dependencies:
-r requirements/tests/postgres.txt \
-r requirements/tests/oracle.txt \
-r requirements/tests/pydantic-2.txt \
-r requirements/tests/spark-3.5.1.txt
-r requirements/tests/spark-3.5.2.txt

# TODO: remove after https://github.com/zqmillet/sphinx-plantuml/pull/4
pip install sphinx-plantuml --no-deps
Expand Down
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ Compatibility matrix
+--------------------------------------------------------------+-------------+-------------+-------+
| `3.4.x <https://spark.apache.org/docs/3.4.3/#downloading>`_ | 3.7 - 3.12 | 8u362 - 20 | 2.12 |
+--------------------------------------------------------------+-------------+-------------+-------+
| `3.5.x <https://spark.apache.org/docs/3.5.1/#downloading>`_ | 3.8 - 3.12 | 8u371 - 20 | 2.12 |
| `3.5.x <https://spark.apache.org/docs/3.5.2/#downloading>`_ | 3.8 - 3.12 | 8u371 - 20 | 2.12 |
+--------------------------------------------------------------+-------------+-------------+-------+

.. _pyspark-install:
Expand All @@ -199,7 +199,7 @@ or install PySpark explicitly:

.. code:: bash

pip install onetl pyspark==3.5.1 # install a specific PySpark version
pip install onetl pyspark==3.5.2 # install a specific PySpark version

or inject PySpark to ``sys.path`` in some other way BEFORE creating a class instance.
**Otherwise connection object cannot be created.**
Expand Down Expand Up @@ -540,7 +540,7 @@ Read files directly from S3 path, convert them to dataframe, transform it and th
setup_logging()

# Initialize new SparkSession with Hadoop AWS libraries and Postgres driver loaded
maven_packages = SparkS3.get_packages(spark_version="3.5.1") + Postgres.get_packages()
maven_packages = SparkS3.get_packages(spark_version="3.5.2") + Postgres.get_packages()
spark = (
SparkSession.builder.appName("spark_app_onetl_demo")
.config("spark.jars.packages", ",".join(maven_packages))
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ services:
context: .
target: base
args:
SPARK_VERSION: 3.5.1
SPARK_VERSION: 3.5.2
env_file: .env.docker
volumes:
- ./:/app/
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ ENV PATH=${ONETL_USER_HOME}/.local/bin:${PATH}
COPY --chown=onetl:onetl ./run_tests.sh ./pytest_runner.sh ./combine_coverage.sh /app/
RUN chmod +x /app/run_tests.sh /app/pytest_runner.sh /app/combine_coverage.sh

ARG SPARK_VERSION=3.5.1
ARG SPARK_VERSION=3.5.2
# Spark is heavy, and version change is quite rare
COPY --chown=onetl:onetl ./requirements/tests/spark-${SPARK_VERSION}.txt /app/requirements/tests/
RUN pip install -r /app/requirements/tests/spark-${SPARK_VERSION}.txt
Expand Down
1 change: 1 addition & 0 deletions docs/changelog/next_release/306.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update ``Excel`` package from ``0.20.3`` to ``0.20.4``, to include Spark 3.5.1 support.
4 changes: 2 additions & 2 deletions docs/connection/db_connection/clickhouse/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ References
Here you can find source code with type conversions:

* `Clickhouse -> JDBC <https://github.com/ClickHouse/clickhouse-java/blob/0.3.2/clickhouse-jdbc/src/main/java/com/clickhouse/jdbc/JdbcTypeMapping.java#L39-L176>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L307>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L141-L164>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L307>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L141-L164>`_
* `JDBC -> Clickhouse <https://github.com/ClickHouse/clickhouse-java/blob/0.3.2/clickhouse-jdbc/src/main/java/com/clickhouse/jdbc/JdbcTypeMapping.java#L185-L311>`_

Supported types
Expand Down
4 changes: 2 additions & 2 deletions docs/connection/db_connection/mssql/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,8 @@ References
Here you can find source code with type conversions:

* `MSSQL -> JDBC <https://github.com/microsoft/mssql-jdbc/blob/v12.2.0/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResultSetMetaData.java#L117-L170>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala#L102-L119>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala#L121-L130>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala#L117-L134>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala#L136-L145>`_
* `JDBC -> MSSQL <https://github.com/microsoft/mssql-jdbc/blob/v12.2.0/src/main/java/com/microsoft/sqlserver/jdbc/DataTypes.java#L625-L676>`_

Supported types
Expand Down
4 changes: 2 additions & 2 deletions docs/connection/db_connection/mysql/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,8 @@ References
Here you can find source code with type conversions:

* `MySQL -> JDBC <https://github.com/mysql/mysql-connector-j/blob/8.0.33/src/main/core-api/java/com/mysql/cj/MysqlType.java#L44-L623>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L89-L106>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L182-L188>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L104-L132>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L204-L211>`_
* `JDBC -> MySQL <https://github.com/mysql/mysql-connector-j/blob/8.0.33/src/main/core-api/java/com/mysql/cj/MysqlType.java#L625-L867>`_

Supported types
Expand Down
4 changes: 2 additions & 2 deletions docs/connection/db_connection/oracle/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,8 @@ See `List of Oracle types <https://docs.oracle.com/en/database/oracle/oracle-dat

Here you can find source code with type conversions:

* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L83-L109>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L111-L123>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L83-L109>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L111-L123>`_

Numeric types
~~~~~~~~~~~~~
Expand Down
4 changes: 2 additions & 2 deletions docs/connection/db_connection/postgres/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,8 @@ See `List of Postgres types <https://www.postgresql.org/docs/current/datatype.ht
Here you can find source code with type conversions:

* `Postgres <-> JDBC <https://github.com/pgjdbc/pgjdbc/blob/REL42.6.0/pgjdbc/src/main/java/org/postgresql/jdbc/TypeInfoCache.java#L78-L112>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L50-L106>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/ce5ddad990373636e94071e7cef2f31021add07b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L116-L130>`_
* `JDBC -> Spark <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L50-L106>`_
* `Spark -> JDBC <https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L116-L130>`_

Numeric types
~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion onetl/_metrics/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def extract_metrics_from_execution(execution: SparkListenerExecution) -> SparkCo
disk_spilled_bytes += stage.metrics.disk_spilled_bytes
result_size_bytes += stage.metrics.result_size_bytes

# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L467-L473
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L467-L473
input_file_count = (
_get_int(execution.metrics, SparkSQLMetricNames.NUMBER_OF_FILES_READ)
or _get_int(execution.metrics, SparkSQLMetricNames.STATIC_NUMBER_OF_FILES_READ)
Expand Down
2 changes: 1 addition & 1 deletion onetl/_metrics/listener/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
class BaseSparkListener:
"""Base no-op SparkListener implementation.

See `SparkListener <https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/SparkListener.html>`_ interface.
See `SparkListener <https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/SparkListener.html>`_ interface.
"""

spark: SparkSession
Expand Down
12 changes: 6 additions & 6 deletions onetl/_metrics/listener/execution.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,18 @@ class SparkSQLMetricNames(str, Enum): # noqa: WPS338
# Metric names passed to SQLMetrics.createMetric(...)
# But only those we're interested in.

# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L233C55-L233C87
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L231
NUMBER_OF_PARTITIONS_READ = "number of partitions read"

# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L225-L227
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L225-L227
NUMBER_OF_FILES_READ = "number of files read"
SIZE_OF_FILES_READ = "size of files read"

# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L455-L456
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L225-L227
STATIC_NUMBER_OF_FILES_READ = "static number of files read"
STATIC_SIZE_OF_FILES_READ = "static size of files read"

# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala#L241-L246
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala#L241-L246
NUMBER_OF_DYNAMIC_PART = "number of dynamic part"
NUMBER_OF_WRITTEN_FILES = "number of written files"

Expand Down Expand Up @@ -62,11 +62,11 @@ def jobs(self) -> list[SparkListenerJob]:
return result

def on_execution_start(self, event):
# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L44-L58
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L44-L58
self.status = SparkListenerExecutionStatus.STARTED

def on_execution_end(self, event):
# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L61-L83
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L61-L83
for job in self._jobs.values():
if job.status == SparkListenerJobStatus.FAILED:
self.status = SparkListenerExecutionStatus.FAILED
Expand Down
4 changes: 2 additions & 2 deletions onetl/_metrics/listener/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ def stages(self) -> list[SparkListenerStage]:

@classmethod
def create(cls, event):
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/SparkListenerJobSubmitted.html
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/SparkListenerJobCompleted.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/SparkListenerJobSubmitted.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/SparkListenerJobCompleted.html
result = cls(
id=event.jobId(),
description=event.properties().get("spark.job.description"),
Expand Down
2 changes: 1 addition & 1 deletion onetl/_metrics/listener/listener.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def onExecutionEnd(self, event):

# Get execution metrics from SQLAppStatusStore,
# as SparkListenerSQLExecutionEnd event does not provide them:
# https://github.com/apache/spark/blob/v3.5.1/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala
# https://github.com/apache/spark/blob/v3.5.2/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala
session_status_store = self.spark._jsparkSession.sharedState().statusStore() # noqa: WPS437
raw_execution = session_status_store.execution(execution.id).get()
metrics = raw_execution.metrics()
Expand Down
6 changes: 3 additions & 3 deletions onetl/_metrics/listener/stage.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def __str__(self):

@dataclass
class SparkListenerStage:
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/StageInfo.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/StageInfo.html
id: int
status: SparkListenerStageStatus = SparkListenerStageStatus.PENDING
metrics: SparkListenerTaskMetrics = field(default_factory=SparkListenerTaskMetrics, repr=False, init=False)
Expand All @@ -39,11 +39,11 @@ def create(cls, stage_info):
return cls(id=stage_info.stageId())

def on_stage_start(self, event):
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/SparkListenerStageSubmitted.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/SparkListenerStageSubmitted.html
self.status = SparkListenerStageStatus.ACTIVE

def on_stage_end(self, event):
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/SparkListenerStageCompleted.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/SparkListenerStageCompleted.html
stage_info = event.stageInfo()
if stage_info.failureReason().isDefined():
self.status = SparkListenerStageStatus.FAILED
Expand Down
6 changes: 3 additions & 3 deletions onetl/_metrics/listener/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,14 @@ class SparkListenerTask:

@classmethod
def create(cls, task_info):
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/TaskInfo.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/TaskInfo.html
return cls(id=task_info.taskId())

def on_task_start(self, event):
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/SparkListenerTaskStart.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/SparkListenerTaskStart.html
self.status = SparkListenerTaskStatus(event.taskInfo().status())

def on_task_end(self, event):
# https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/scheduler/SparkListenerTaskEnd.html
# https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/scheduler/SparkListenerTaskEnd.html
self.status = SparkListenerTaskStatus(event.taskInfo().status())
self.metrics = SparkListenerTaskMetrics.create(event.taskMetrics())
2 changes: 1 addition & 1 deletion onetl/_util/spark.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def estimate_dataframe_size(spark_session: SparkSession, df: DataFrame) -> int:
"""
Estimate in-memory DataFrame size in bytes. If cannot be estimated, return 0.

Using Spark's `SizeEstimator <https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/util/SizeEstimator.html>`_.
Using Spark's `SizeEstimator <https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/util/SizeEstimator.html>`_.
"""
try:
size_estimator = spark_session._jvm.org.apache.spark.util.SizeEstimator # type: ignore[union-attr]
Expand Down
4 changes: 2 additions & 2 deletions onetl/connection/db_connection/kafka/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ def write_df_to_target(
write_options.update(options.dict(by_alias=True, exclude_none=True, exclude={"if_exists"}))
write_options["topic"] = target

# As of Apache Spark version 3.5.0, the mode 'error' is not functioning as expected.
# As of Apache Spark version 3.5.2, the mode 'error' is not functioning as expected.
# This issue has been reported and can be tracked at:
# https://issues.apache.org/jira/browse/SPARK-44774
mode = options.if_exists
Expand Down Expand Up @@ -418,7 +418,7 @@ def get_packages(
from onetl.connection import Kafka

Kafka.get_packages(spark_version="3.2.4")
Kafka.get_packages(spark_version="3.2.4", scala_version="2.13")
Kafka.get_packages(spark_version="3.2.4", scala_version="2.12")

"""

Expand Down
6 changes: 3 additions & 3 deletions onetl/connection/file_df_connection/spark_s3/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ class SparkS3(SparkFileDFConnection):
from pyspark.sql import SparkSession

# Create Spark session with Hadoop AWS libraries loaded
maven_packages = SparkS3.get_packages(spark_version="3.5.0")
maven_packages = SparkS3.get_packages(spark_version="3.5.2")
# Some dependencies are not used, but downloading takes a lot of time. Skipping them.
excluded_packages = [
"com.google.cloud.bigdataoss:gcs-connector",
Expand Down Expand Up @@ -236,8 +236,8 @@ def get_packages(

from onetl.connection import SparkS3

SparkS3.get_packages(spark_version="3.5.0")
SparkS3.get_packages(spark_version="3.5.0", scala_version="2.12")
SparkS3.get_packages(spark_version="3.5.2")
SparkS3.get_packages(spark_version="3.5.2", scala_version="2.12")

"""

Expand Down
Loading