Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace deprecated clickhouse driver #4925

Conversation

tangjiangling
Copy link

Fixes: #4924

@tangjiangling
Copy link
Author

tangjiangling commented Jan 16, 2022

A related issue(trinodb/trino#10541)

@tangjiangling
Copy link
Author

cc: @ebyhr @hashhar

@tangjiangling tangjiangling force-pushed the replace-deprecated-clickhouse-driver branch from 4f10e21 to 6cc31d1 Compare January 17, 2022 04:28
@kiview
Copy link
Member

kiview commented Jan 18, 2022

Looks generally good to me, but I wonder what are the implications for users, that still have the ru.yandex dependency on their classpath (and not the new one). Probably there are none since the container already sets its own WaitStrategy (instead of using a JDBC test query). WDYT @rnorth?

@tangjiangling
Copy link
Author

I wonder what are the implications for users, that still have the ru.yandex dependency on their classpath (and not the new one).

@zhicwu Can you help explain why you need to refactor the new driver?

@zhicwu
Copy link

zhicwu commented Jan 18, 2022

As you probably knew, ClickHouse spinout from Yandex was announced in late September 2021(details at here). Since then the official website was changed from https://clickhouse.tech to https://clickhouse.com, and docker images were moved from yandex to clickhouse. To follow the path, starting from v0.3.2, the JDBC driver started to use new package name and maven group id.

From technical perspective, refactoring was discussed as in ClickHouse/clickhouse-java#570 and was kicked off since 0.3.0. Apart from that, it's worthy of mention that 0.3.2, unlike the version implied, is a complete re-write(primarily to address some fundemental issues).

@kiview, the legacy JDBC driver(ru.yandex.clickhouse.ClickHouseDriver) co-exists with new one(com.clickhouse.jdbc.ClickHouseDriver) in 0.3.2 and 0.3.3(not released yet). Perhaps in testcontainer we can take below approach to mitigate the potential impact:

  1. add a flag(environment varialbe and/or system property) to switch between not only old and new JDBC drivers but also docker image(yandex/clickhouse-server vs. clickhouse/clickhouse-server) - use old driver and docker image by default
  2. warn user when creating container from old docker image - suggest to use clickhouse/clickhouse-server:21.3 or above

We can revisit this later(maybe June / July?) to remove the warning message and retire the flag.

@tangjiangling
Copy link
Author

tangjiangling commented Jan 29, 2022

Perhaps in testcontainer we can take below approach to mitigate the potential impact:
1. add a flag(environment varialbe and/or system property) to switch between not only old and new JDBC drivers but also docker image(yandex/clickhouse-server vs. clickhouse/clickhouse-server) - use old driver and docker image by default
2. warn user when creating container from old docker image - suggest to use clickhouse/clickhouse-server:21.3 or above

@kiview Sorry for the delay. Even though you made a +1 in the above comment, I still need to confirm with you if you agree to do so? If you agree, I will update the current PR by this Sunday.

Also thank you very much @zhicwu for your answers and suggestions for changes.

@tangjiangling
Copy link
Author

We can revisit this later(maybe June / July?) to remove the warning message and retire the flag.

@kiview @zhicwu I am willing to follow up on this issue on an ongoing basis as well.

The driver `ru.yandex.clickhouse.ClickHouseDriver` has been marked as deprecated.
Also everything in package `ru.yandex.clickhouse` will be removed starting from 0.4.0.
@tangjiangling tangjiangling force-pushed the replace-deprecated-clickhouse-driver branch from 6cc31d1 to f785aa5 Compare January 30, 2022 20:49
@tangjiangling
Copy link
Author

I will update the current PR by this Sunday.

Updated.

@zhicwu
Copy link

zhicwu commented Jan 31, 2022

@kiview @zhicwu I am willing to follow up on this issue on an ongoing basis as well.

Thank you @tangjiangling. I'm all good with change.

@kiview, appreciate if any of you can comment to move this forward. In case there's concern of mixing two drivers for one container, we can add ClickHouseLegacyContainer to separate the implementation.

@kiview
Copy link
Member

kiview commented Jan 31, 2022

@zhicwu @tangjiangling
I think we got mixed up in this PR. Sadly I can't see the original commit anymore, since there were some force-pushes and squashed commits.

The driver dependency is only for the internal tests, there are no implications to the users (which I was thinking about in #4925 (comment)).

The module should support the new driver without any changes to the code.

The default no-args constructor is already deprecated, users should provide the image they want to use that is compatible with their driver. As such, I struggle to understand the need for this PR, at least with this amount of user-facing logs.

It might be, that the original PR was fine, but I can't check this now.

Sorry for the back and forth, but maybe we learned that the PR is not necessary in the first place.

@zhicwu
Copy link

zhicwu commented Jan 31, 2022

Thanks @kiview for the clarification. I think the confusing came from getDriverClassName() method and the default driver class :)

Does it make sense to only change below constant to com.clickhouse.jdbc.ClickHouseDriver?

private static final String DRIVER_CLASS_NAME = "ru.yandex.clickhouse.ClickHouseDriver";

Update:
On a second thought, since ClickHouse supports multiple protocols(e.g. http, grpc, tcp/native, mysql and postgresql etc.), I think it's better to enhance the module with following changes:

  1. change DRIVER_CLASS_NAME to com.clickhouse.jdbc.ClickHouseDriver to avoid confusing
  2. exposes more ports than just http - getJdbcUrl() should be updated accordingly
  3. use JDBC test query(or better java.sql.Connection.isValid()) to check server readiness - try DRIVER_CLASS_NAME first, fall back to DriverManager so it will work with legacy driver as well as 3rd party drivers, and lastly HttpWaitStrategy(or check server logs)

Let me know if above looks good to you. I think we can hold the changes until clickhouse-jdbc v0.3.3(with tcp/native support in addition to http and grpc) is released.

@kiview
Copy link
Member

kiview commented Jan 31, 2022

I looked a bit more into our implementations and the interactions with our code to refresh my context on this PR. Indeed there is a use case where the value of ClickHouseContainer.getDriverClassName() (and also DEFAULT_IMAGE_NAME) comes into play and that is when using a special TC JDBC-URL (as implemented through ClickHouseProvider):
https://www.testcontainers.org/modules/databases/jdbc/

@zhicwu What would you consider the most stable way to wait for ClickHouse to be ready? If we have a sufficiently stable log message, we can just configure it (in the same way we do it for PostgreSQLContainer) and thereby remove some more complexities.

Whatever we do or change, it would be helpful for me to understand the need from a module user perspective. As it is now, users can already get the HTTP_PORT and the NATIVE_PORT and thereby construct URLs according to their needs.

So in order to better see where we need to go with this PR, I need to understand if this is about direct usage of ClickHouseContainer, or only about the special JDBC- URL use case.

@zhicwu
Copy link

zhicwu commented Jan 31, 2022

It's direct ClickHouseContainer usage. In trinodb/trino#10801, getDriverClassName() was customized for using legacy driver to test against ClickHouse 20.3, and new driver for 20.7+. Unfortunately TC JDBC URL won't help in this case.

private static ClickHouseContainer createContainer(DockerImageName image)
{
    boolean latestDriverMinimumSupportedVersion = ClickHouseVersion.of(image.getVersionPart()).isNewerOrEqualTo(CLICKHOUSE_LATEST_DRIVER_MINIMUM_SUPPORTED_VERSION);
    String driverClass = getClickhouseDriverClassName(latestDriverMinimumSupportedVersion);
    return new ClickHouseContainer(image)
    {
        @Override
        public String getDriverClassName()
        {
            return driverClass;
        }
    };
}

What would you consider the most stable way to wait for ClickHouse to be ready?

Yes, it's better to check logs than probing ports which may or may not be enabled. By default, you can watch /var/log/clickhouse-server.log until <Information> Application: Ready for connections showed up(see here for more information) or timed out for waiting.

Since user may customize the configuration, maybe it's better to check /var/lib/clickhouse/preprocessed_configs/config.xml (merged configuration) first, for examples: (/clickhouse|/yandex)/logger/log and (/clickhouse|/yandex)/logger/log_level.

<clickhouse>
    <logger>
        <!-- Possible levels [1]:
          - none (turns off logging)
          - fatal
          - critical
          - error
          - warning
          - notice
          - information
          - debug
          - trace
          - test (not for production usage)
            [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
        -->
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
    </logger>
</clickhouse>

@kiview
Copy link
Member

kiview commented Feb 1, 2022

@zhicwu If you are using ClickHouseContainer as an object, it is unclear to me, why you needed to override getDriverClassName() in trinodb/trino#10801, this should not be necessary.

Regarding LogMessageWaitStrategy, Testcontainers will wait for a certain log message to appear in the container's stdout, this is the contract against which the wait strategy will integrate. However, if you suspect the actual logging to be highly dependent on user configuration (which I personally would not expect in Testcontainers usage scenarios), then it would not be an appropriate change.

Also, note that CI is currently failing for the ClickHouse tests.

Also, looking at something similar we had with MySQL in the past, I'd suggest a minimal implementation along the lines of #1760.

@tangjiangling
Copy link
Author

tangjiangling commented Feb 1, 2022

why you needed to override getDriverClassName() in trinodb/trino#10801, this should not be necessary.

#4925 (comment)

As you probably know, the new Driver(com.clickhouse.jdbc.ClickHouseDriver) is not compatible with ClickHouse servers smaller than version 20.7, so for compatibility we still need to use the old Driver(ru.yandex.clickhouse.ClickHouseDriver), which is exactly what Trino did.

I think the most critical issue is that the new driver is not well backward compatible (there may be some objective reasons for this).

@kiview
Copy link
Member

kiview commented Feb 1, 2022

There is no obvious need for code interacting with ClickHouseContainer to call getDriverClassName(). However, internal Testcontainers code will use getDriverClassName() when the special JDBC URL integration is used.

Can you please point me at the code in Trino that interacts with getDriverClassName(), so I can understand the actual use case better?

@tangjiangling
Copy link
Author

tangjiangling commented Feb 1, 2022

Can you please point me at the code in Trino that interacts with getDriverClassName(), so I can understand the actual use case better?

trinodb/trino@8f59446#diff-ab0fd6dd2143f575dfbc2c3a631ebc4db19b2d1f9124ef58ce4c20a368764624R62-R81

Trino does not call ClickHouseContainer#getDriverClassName directly, this method is only used inside Testcontainers (as far as I know just to do health checks).

Why do I submit this PR?

  1. Trino wants to upgrade to use the new Driver, why should there be a new Driver? The reason is as @zhicwu said (Replace deprecated clickhouse driver #4925 (comment)), not only does this include Trino's own code, but we also want Testcontainers to use the new Driver (if Testcontainers don't make any changes, this will not affect Trino)
  2. Out of passion, we also expect to help clickhouse-jdbc upgrade to solve the compatibility problem of the new Driver on some basic components (there are many components using Testcontainers, and we expect them all to use the new Driver without having to think about solving the compatibility problem of the new Driver)

I think the most critical issue is that the new driver is not well backward compatible (there may be some objective reasons for this).

As I mentioned before, this is the key point.

cc @zhicwu You can add if I'm wrong or didn't make a point.

@tangjiangling
Copy link
Author

Also, note that CI is currently failing for the ClickHouse tests.

I'll take a look.

@eddumelendez
Copy link
Member

Hi @tangjiangling, there have been a couple of PRs taking care of compatibility with the new docker image and the driver. See #3021 and #5474.

The default image remains the old one due to backward compatibility (it can change in a major version). Image usage and the driver depends mostly on the usage. So, AFAICT needs are being covered.

Thanks for the PR anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace deprecated ClickHouse Driver
4 participants