Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOP-13855] - update clickhouse driver #249

Merged
merged 10 commits into from
Apr 15, 2024
1 change: 1 addition & 0 deletions docs/changelog/next_release/249.breaking.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Updated the Clickhouse JDBC driver from ``ru.yandex.clickhouse:clickhouse-jdbc:0.3.2`` to `com.clickhouse:clickhouse-jdbc:0.6.0 <https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.6.0>`_.
1 change: 1 addition & 0 deletions docs/changelog/next_release/249.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Allow passing custom JDBC driver version to ``Clickhouse.get_packages(package_version=...)``.
57 changes: 26 additions & 31 deletions docs/connection/db_connection/clickhouse/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,9 @@ Numeric types
~~~~~~~~~~~~~

+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| Clickhouse type (read) | Spark type | Clickhousetype (write) | Clickhouse type (create) |
| Clickhouse type (read) | Spark type | Clickhouse type (write) | Clickhouse type (create) |
+================================+===================================+===============================+===============================+
| ``Bool`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``-`` | ``BooleanType()`` | ``UInt64`` | ``UInt64`` |
| ``Bool`` | ``BooleanType()`` | ``UInt64`` | ``UInt64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Decimal`` | ``DecimalType(P=10, S=0)`` | ``Decimal(P=10, S=0)`` | ``Decimal(P=10, S=0)`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
Expand All @@ -147,11 +145,9 @@ Numeric types
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Decimal256(S=0..76)`` | unsupported [3]_ | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Float32`` | ``DoubleType()`` | ``Float64`` | ``Float64`` |
+--------------------------------+ | | |
| ``Float64`` | | | |
| ``Float32`` | ``FloatType()`` | ``Float32`` | ``Float32`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``-`` | ``FloatType()`` | ``Float32`` | ``Float32`` |
| ``Float64`` | ``DoubleType()`` | ``Float64`` | ``Float64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int8`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+ | | |
Expand All @@ -161,7 +157,7 @@ Numeric types
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int64`` | ``LongType()`` | ``Int64`` | ``Int64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int128`` | ``DecimalType(20,0)`` | ``Decimal(20,0)`` | ``Decimal(20,0)`` |
| ``Int128`` | unsupported [3]_ | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``Int256`` | unsupported [3]_ | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
Expand All @@ -170,16 +166,16 @@ Numeric types
| ``-`` | ``ShortType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt8`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+--------------------------------+ | | |
| ``UInt16`` | | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt16`` | ``LongType()`` | ``Int64`` | ``Int64`` |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt32`` | ``DecimalType(20,0)`` | ``Decimal(20,0)`` | ``Decimal(20,0)`` |
+--------------------------------+ | | |
| ``UInt64`` | | | |
+--------------------------------+ | | |
| ``UInt128`` | | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+
| ``UInt256`` | unsupported [3]_ | | |
| ``UInt128`` | unsupported [3]_ | | |
+--------------------------------+ | | |
| ``UInt256`` | | | |
+--------------------------------+-----------------------------------+-------------------------------+-------------------------------+

.. [3]
Expand All @@ -198,33 +194,32 @@ Notes:
* ``TIMESTAMP`` is alias for ``DateTime32``, but ``TIMESTAMP(N)`` is alias for ``DateTime64(N)``

+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| Clickhouse type (read) | Spark type | Clickhousetype (write) | Clickhouse type (create) |
| Clickhouse type (read) | Spark type | Clickhouse type (write) | Clickhouse type (create) |
+===================================+======================================+==================================+===============================+
| ``Date`` | ``DateType()`` | ``Date`` | ``Date`` |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``Date32`` | unsupported | | |
| ``Date32`` | ``DateType()`` | ``Date`` | ``Date`` |
| | | | **cannot be inserted** [6]_ |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``DateTime32``, seconds | ``TimestampType()``, microseconds | ``DateTime64(6)``, microseconds | ``DateTime32``, seconds, |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``DateTime64(3)``, milliseconds | ``TimestampType()``, microseconds | ``DateTime64(6)``, microseconds | ``DateTime32``, seconds, |
| | | | **precision loss** [5]_ |
| | | | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``DateTime64(6)``, microseconds | ``TimestampType()``, microseconds | ``DateTime64(6)``, microseconds | ``DateTime32``, seconds, |
+-----------------------------------+--------------------------------------+----------------------------------+ **cannot be inserted** [6]_ |
| ``DateTime64(7..9)``, nanoseconds | ``TimestampType()``, microseconds, | ``DateTime64(6)``, microseconds, | |
| | **precision loss** [4]_ | **precision loss** [4]_ | |
| | | | |
| ``DateTime32``, seconds | ``TimestampType()`` | ``DateTime64(6)``, microseconds | ``DateTime32`` |
+-----------------------------------+--------------------------------------+----------------------------------+ seconds |
| ``DateTime64(3)``, milliseconds | ``TimestampType()`` | ``DateTime64(6)``, microseconds | **precision loss** [4]_ |
+-----------------------------------+--------------------------------------+----------------------------------+ |
| ``DateTime64(6)``, microseconds | ``TimestampType()`` | ``DateTime64(6)``, microseconds | |
+-----------------------------------+--------------------------------------+----------------------------------+ |
| ``-`` | ``TimestampNTZType()``, microseconds | ``DateTime64(6)`` | |
| ``DateTime64(7..9)``, nanoseconds | ``TimestampType()`` | ``DateTime64(6)`` | |
| | | microseconds | |
| | | **precision loss** [4]_ | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``IntervalNanosecond`` | unsupported | | |
| ``-`` | ``TimestampNTZType()`` | ``DateTime64(6)`` | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``IntervalNanosecond`` | ``LongType()`` | ```Int64`` | ``Int64`` |
+-----------------------------------+ | | |
| ``IntervalMicrosecond`` | | | |
+-----------------------------------+ | | |
| ``IntervalMillisecond`` | | | |
+-----------------------------------+--------------------------------------+----------------------------------+-------------------------------+
| ``IntervalSecond`` | ``IntegerType()`` | ``Int32`` | ``Int32`` |
+-----------------------------------+ | | |
| ``IntervalSecond`` | | | |
+-----------------------------------+ | | |
| ``IntervalMinute`` | | | |
+-----------------------------------+ | | |
Expand Down
2 changes: 1 addition & 1 deletion onetl/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.10.3
0.11.0
12 changes: 6 additions & 6 deletions onetl/_util/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,17 +195,17 @@ def min_digits(self, num_parts: int) -> Version:
>>> Version("5.6.7").min_digits(3)
Version('5.6.7')
>>> Version("5.6.7").min_digits(2)
Version('5.6')
Version('5.6.7')
>>> Version("5.6").min_digits(3)
Traceback (most recent call last):
...
ValueError: Version '5.6' does not have enough numeric components for requested format.
ValueError: Version '5.6' does not have enough numeric components for requested format (expected at least 3).
"""
if len(self._numeric_parts) < num_parts:
raise ValueError(f"Version '{self}' does not have enough numeric components for requested format.")
truncated_parts = self._numeric_parts[:num_parts]
truncated_str = ".".join(str(part) for part in truncated_parts)
return Version(truncated_str)
raise ValueError(
f"Version '{self}' does not have enough numeric components for requested format (expected at least {num_parts}).",
)
return self

def format(self, format_string: str) -> str:
"""
Expand Down
51 changes: 40 additions & 11 deletions onetl/connection/db_connection/clickhouse/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
from __future__ import annotations

import logging
import warnings
from typing import ClassVar, Optional

from onetl._util.classproperty import classproperty
from onetl._util.version import Version
from onetl.connection.db_connection.clickhouse.dialect import ClickhouseDialect
from onetl.connection.db_connection.jdbc_connection import JDBCConnection
from onetl.connection.db_connection.jdbc_mixin import JDBCStatementType
Expand All @@ -28,7 +28,7 @@ class Config:
class Clickhouse(JDBCConnection):
"""Clickhouse JDBC connection. |support_hooks|

Based on Maven package ``ru.yandex.clickhouse:clickhouse-jdbc:0.3.2``
Based on Maven package `com.clickhouse:clickhouse-jdbc:0.6.0 <https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.6.0>`_
(`official Clickhouse JDBC driver <https://github.com/ClickHouse/clickhouse-jdbc>`_).

.. warning::
Expand Down Expand Up @@ -104,32 +104,61 @@ class Clickhouse(JDBCConnection):
Extra = ClickhouseExtra
Dialect = ClickhouseDialect

DRIVER: ClassVar[str] = "ru.yandex.clickhouse.ClickHouseDriver"
DRIVER: ClassVar[str] = "com.clickhouse.jdbc.ClickHouseDriver"

@slot
@classmethod
def get_packages(cls) -> list[str]:
def get_packages(
cls,
package_version: str | None = None,
apache_http_client_version: str | None = None,
) -> list[str]:
"""
Get package names to be downloaded by Spark. |support_hooks|

Parameters
----------
package_version : str , optional
ClickHouse JDBC version client packages. Defaults to ``0.6.0``.

apache_http_client_version : str, optional
Apache HTTP Client version package. Defaults to ``5.3.1``.

Examples
--------

.. code:: python

from onetl.connection import Clickhouse

Clickhouse.get_packages()
Clickhouse.get_packages(package_version="0.6.0", apache_http_client_version="5.3.1")

.. note::

Spark does not support ``.jar`` classifiers, so it is not possible to pass
``com.clickhouse:clickhouse-jdbc:0.6.0:all`` to install all required packages.

"""
return ["ru.yandex.clickhouse:clickhouse-jdbc:0.3.2"]
package_version_obj = Version(package_version).min_digits(3) if package_version else Version("0.6.0")
apache_http_client_version_obj = (
Version(apache_http_client_version).min_digits(3) if apache_http_client_version else Version("5.3.1")
)

result = [
f"com.clickhouse:clickhouse-jdbc:{package_version_obj}",
f"com.clickhouse:clickhouse-http-client:{package_version_obj}",
]

if package_version_obj >= Version("0.5.0"):
# before 0.5.0 builtin Java HTTP Client was used
result.append(f"org.apache.httpcomponents.client5:httpclient5:{apache_http_client_version_obj}")

return result

@classproperty
def package(cls) -> str:
"""Get package name to be downloaded by Spark."""
msg = "`Clickhouse.package` will be removed in 1.0.0, use `Clickhouse.get_packages()` instead"
warnings.warn(msg, UserWarning, stacklevel=3)
return "ru.yandex.clickhouse:clickhouse-jdbc:0.3.2"
def package(self) -> str:
"""Get a single string of package names to be downloaded by Spark for establishing a Clickhouse connection."""
return "com.clickhouse:clickhouse-jdbc:0.6.0,com.clickhouse:clickhouse-http-client:0.6.0,org.apache.httpcomponents.client5:httpclient5:5.3.1"

@property
def jdbc_url(self) -> str:
Expand Down
2 changes: 1 addition & 1 deletion onetl/connection/db_connection/greenplum/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ def get_packages(
scala_ver = Version(scala_version).min_digits(2)
elif spark_version:
spark_ver = Version(spark_version).min_digits(2)
if spark_ver > Version("3.2") or spark_ver < Version("2.3"):
if spark_ver >= Version("3.3") or spark_ver < Version("2.3"):
raise ValueError(f"Spark version must be 2.3.x - 3.2.x, got {spark_ver}")
scala_ver = get_default_scala_version(spark_ver)
else:
Expand Down
Loading