Skip to content

Commit

Permalink
[DOP-13855] - update clickhouse driver (#249)
Browse files Browse the repository at this point in the history
* [DOP-13855] - update clickhouse driver

* [DOP-13855] - update clickhouse get_packages() tests

* [DOP-13855] Fix Clickhouse HWM tests

* [DOP-13855] - update clickhouse get_packages() tests

* [DOP-13855] Fix Clickhouse HWM tests

* [DOP-13855] Fix Clickhouse HWM tests

* [DOP-13855] - update clickhouse get_packages() tests

* [DOP-13855] - fix Version.min_digits() method

* [DOP-13855] - update numeric mapping

* [DOP-13855] - update temporal mapping

---------

Co-authored-by: Мартынов Максим Сергеевич <msmarty5@mts.ru>
  • Loading branch information
maxim-lixakov and dolfinus authored Apr 15, 2024
1 parent 702400e commit 5b370bc
Show file tree
Hide file tree
Showing 12 changed files with 166 additions and 64 deletions.
1 change: 1 addition & 0 deletions docs/changelog/next_release/249.breaking.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Updated the Clickhouse JDBC driver from ``ru.yandex.clickhouse:clickhouse-jdbc:0.3.2`` to `com.clickhouse:clickhouse-jdbc:0.6.0 <https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.6.0>`_.
1 change: 1 addition & 0 deletions docs/changelog/next_release/249.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Allow passing custom JDBC driver version to ``Clickhouse.get_packages(package_version=...)``.
57 changes: 26 additions & 31 deletions docs/connection/db_connection/clickhouse/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,9 @@ Numeric types
~~~~~~~~~~~~~

+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| Clickhouse type (read) | Spark type | Clickhousetype (write) | Clickhouse type (create) |
| Clickhouse type (read) | Spark type | Clickhouse type (write) | Clickhouse type (create) |
+================================+===================================+===============================+===============================+
| ``Bool`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``-`` | ``BooleanType()`` | ``UInt64`` | ``UInt64`` |
| ``Bool`` | ``BooleanType()`` | ``UInt64`` | ``UInt64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Decimal`` | ``DecimalType(P=10, S=0)`` | ``Decimal(P=10, S=0)`` | ``Decimal(P=10, S=0)`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
Expand All @@ -147,11 +145,9 @@ Numeric types
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Decimal256(S=0..76)`` | unsupported [3]_ | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Float32`` | ``DoubleType()`` | ``Float64`` | ``Float64`` |
+--------------------------------+ | | |
| ``Float64`` | | | |
| ``Float32`` | ``FloatType()`` | ``Float32`` | ``Float32`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``-`` | ``FloatType()`` | ``Float32`` | ``Float32`` |
| ``Float64`` | ``DoubleType()`` | ``Float64`` | ``Float64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int8`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+ | | |
Expand All @@ -161,7 +157,7 @@ Numeric types
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int64`` | ``LongType()`` | ``Int64`` | ``Int64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int128`` | ``DecimalType(20,0)`` | ``Decimal(20,0)`` | ``Decimal(20,0)`` |
| ``Int128`` | unsupported [3]_ | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int256`` | unsupported [3]_ | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
Expand All @@ -170,16 +166,16 @@ Numeric types
| ``-`` | ``ShortType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt8`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+ | | |
| ``UInt16`` | | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt16`` | ``LongType()`` | ``Int64`` | ``Int64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt32`` | ``DecimalType(20,0)`` | ``Decimal(20,0)`` | ``Decimal(20,0)`` |
+--------------------------------+ | | |
| ``UInt64`` | | | |
+--------------------------------+ | | |
| ``UInt128`` | | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt256`` | unsupported [3]_ | | |
| ``UInt128`` | unsupported [3]_ | | |
+--------------------------------+ | | |
| ``UInt256`` | | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+

.. [3]
Expand All @@ -198,33 +194,32 @@ Notes:
* ``TIMESTAMP`` is alias for ``DateTime32``, but ``TIMESTAMP(N)`` is alias for ``DateTime64(N)``

+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| Clickhouse type (read) | Spark type | Clickhousetype (write) | Clickhouse type (create) |
| Clickhouse type (read) | Spark type | Clickhouse type (write) | Clickhouse type (create) |
+===================================+======================================+==================================+===============================+
| ``Date`` | ``DateType()`` | ``Date`` | ``Date`` |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``Date32`` | unsupported | | |
| ``Date32`` | ``DateType()`` | ``Date`` | ``Date`` |
| | | | **cannot be inserted** [6]_ |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``DateTime32``, seconds | ``TimestampType()``, microseconds | ``DateTime64(6)``, microseconds | ``DateTime32``, seconds, |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``DateTime64(3)``, milliseconds | ``TimestampType()``, microseconds | ``DateTime64(6)``, microseconds | ``DateTime32``, seconds, |
| | | | **precision loss** [5]_ |
| | | | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``DateTime64(6)``, microseconds | ``TimestampType()``, microseconds | ``DateTime64(6)``, microseconds | ``DateTime32``, seconds, |
+-----------------------------------+--------------------------------------+----------------------------------+ **cannot be inserted** [6]_ |
| ``DateTime64(7..9)``, nanoseconds | ``TimestampType()``, microseconds, | ``DateTime64(6)``, microseconds, | |
| | **precision loss** [4]_ | **precision loss** [4]_ | |
| | | | |
| ``DateTime32``, seconds | ``TimestampType()`` | ``DateTime64(6)``, microseconds | ``DateTime32`` |
+-----------------------------------+--------------------------------------+----------------------------------+ seconds |
| ``DateTime64(3)``, milliseconds | ``TimestampType()`` | ``DateTime64(6)``, microseconds | **precision loss** [4]_ |
+-----------------------------------+--------------------------------------+----------------------------------+ |
| ``DateTime64(6)``, microseconds | ``TimestampType()`` | ``DateTime64(6)``, microseconds | |
+-----------------------------------+--------------------------------------+----------------------------------+ |
| ``-`` | ``TimestampNTZType()``, microseconds | ``DateTime64(6)`` | |
| ``DateTime64(7..9)``, nanoseconds | ``TimestampType()`` | ``DateTime64(6)`` | |
| | | microseconds | |
| | | **precision loss** [4]_ | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``IntervalNanosecond`` | unsupported | | |
| ``-`` | ``TimestampNTZType()`` | ``DateTime64(6)`` | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``IntervalNanosecond`` | ``LongType()`` | ```Int64`` | ``Int64`` |
+-----------------------------------+ | | |
| ``IntervalMicrosecond`` | | | |
+-----------------------------------+ | | |
| ``IntervalMillisecond`` | | | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``IntervalSecond`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+-----------------------------------+ | | |
| ``IntervalSecond`` | | | |
+-----------------------------------+ | | |
| ``IntervalMinute`` | | | |
+-----------------------------------+ | | |
Expand Down
2 changes: 1 addition & 1 deletion onetl/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.10.3
0.11.0
12 changes: 6 additions & 6 deletions onetl/_util/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,17 +195,17 @@ def min_digits(self, num_parts: int) -> Version:
>>> Version("5.6.7").min_digits(3)
Version('5.6.7')
>>> Version("5.6.7").min_digits(2)
Version('5.6')
Version('5.6.7')
>>> Version("5.6").min_digits(3)
Traceback (most recent call last):
...
ValueError: Version '5.6' does not have enough numeric components for requested format.
ValueError: Version '5.6' does not have enough numeric components for requested format (expected at least 3).
"""
if len(self._numeric_parts) < num_parts:
raise ValueError(f"Version '{self}' does not have enough numeric components for requested format.")
truncated_parts = self._numeric_parts[:num_parts]
truncated_str = ".".join(str(part) for part in truncated_parts)
return Version(truncated_str)
raise ValueError(
f"Version '{self}' does not have enough numeric components for requested format (expected at least {num_parts}).",
)
return self

def format(self, format_string: str) -> str:
"""
Expand Down
51 changes: 40 additions & 11 deletions onetl/connection/db_connection/clickhouse/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
from __future__ import annotations

import logging
import warnings
from typing import ClassVar, Optional

from onetl._util.classproperty import classproperty
from onetl._util.version import Version
from onetl.connection.db_connection.clickhouse.dialect import ClickhouseDialect
from onetl.connection.db_connection.jdbc_connection import JDBCConnection
from onetl.connection.db_connection.jdbc_mixin import JDBCStatementType
Expand All @@ -28,7 +28,7 @@ class Config:
class Clickhouse(JDBCConnection):
"""Clickhouse JDBC connection. |support_hooks|
Based on Maven package ``ru.yandex.clickhouse:clickhouse-jdbc:0.3.2``
Based on Maven package `com.clickhouse:clickhouse-jdbc:0.6.0 <https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.6.0>`_
(`official Clickhouse JDBC driver <https://github.com/ClickHouse/clickhouse-jdbc>`_).
.. warning::
Expand Down Expand Up @@ -104,32 +104,61 @@ class Clickhouse(JDBCConnection):
Extra = ClickhouseExtra
Dialect = ClickhouseDialect

DRIVER: ClassVar[str] = "ru.yandex.clickhouse.ClickHouseDriver"
DRIVER: ClassVar[str] = "com.clickhouse.jdbc.ClickHouseDriver"

@slot
@classmethod
def get_packages(cls) -> list[str]:
def get_packages(
cls,
package_version: str | None = None,
apache_http_client_version: str | None = None,
) -> list[str]:
"""
Get package names to be downloaded by Spark. |support_hooks|
Parameters
----------
package_version : str , optional
ClickHouse JDBC version client packages. Defaults to ``0.6.0``.
apache_http_client_version : str, optional
Apache HTTP Client version package. Defaults to ``5.3.1``.
Examples
--------
.. code:: python
from onetl.connection import Clickhouse
Clickhouse.get_packages()
Clickhouse.get_packages(package_version="0.6.0", apache_http_client_version="5.3.1")
.. note::
Spark does not support ``.jar`` classifiers, so it is not possible to pass
``com.clickhouse:clickhouse-jdbc:0.6.0:all`` to install all required packages.
"""
return ["ru.yandex.clickhouse:clickhouse-jdbc:0.3.2"]
package_version_obj = Version(package_version).min_digits(3) if package_version else Version("0.6.0")
apache_http_client_version_obj = (
Version(apache_http_client_version).min_digits(3) if apache_http_client_version else Version("5.3.1")
)

result = [
f"com.clickhouse:clickhouse-jdbc:{package_version_obj}",
f"com.clickhouse:clickhouse-http-client:{package_version_obj}",
]

if package_version_obj >= Version("0.5.0"):
# before 0.5.0 builtin Java HTTP Client was used
result.append(f"org.apache.httpcomponents.client5:httpclient5:{apache_http_client_version_obj}")

return result

@classproperty
def package(cls) -> str:
"""Get package name to be downloaded by Spark."""
msg = "`Clickhouse.package` will be removed in 1.0.0, use `Clickhouse.get_packages()` instead"
warnings.warn(msg, UserWarning, stacklevel=3)
return "ru.yandex.clickhouse:clickhouse-jdbc:0.3.2"
def package(self) -> str:
"""Get a single string of package names to be downloaded by Spark for establishing a Clickhouse connection."""
return "com.clickhouse:clickhouse-jdbc:0.6.0,com.clickhouse:clickhouse-http-client:0.6.0,org.apache.httpcomponents.client5:httpclient5:5.3.1"

@property
def jdbc_url(self) -> str:
Expand Down
2 changes: 1 addition & 1 deletion onetl/connection/db_connection/greenplum/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ def get_packages(
scala_ver = Version(scala_version).min_digits(2)
elif spark_version:
spark_ver = Version(spark_version).min_digits(2)
if spark_ver > Version("3.2") or spark_ver < Version("2.3"):
if spark_ver >= Version("3.3") or spark_ver < Version("2.3"):
raise ValueError(f"Spark version must be 2.3.x - 3.2.x, got {spark_ver}")
scala_ver = get_default_scala_version(spark_ver)
else:
Expand Down
Loading

0 comments on commit 5b370bc

Please sign in to comment.