Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: clickhouse integration tests #2815

Merged
merged 16 commits into from
Jun 13, 2023
Merged

Conversation

eitsupi
Copy link
Member

@eitsupi eitsupi commented Jun 12, 2023

Close #2813, based on #2814

Same as #2811

@eitsupi
Copy link
Member Author

eitsupi commented Jun 12, 2023

Oh, I see. The order of the Clickhouse results seems to be random.
https://stackoverflow.com/questions/54786494/clickhouse-query-row-order-behaviour

I have observed that the order of the results changes each time I run the following query.

WITH table_0 AS (
  SELECT
    13 AS x_int,
    13.0 AS x_float,
    5 AS k_int,
    5.0 AS k_float
  UNION
  ALL
  SELECT
    -13 AS x_int,
    -13.0 AS x_float,
    5 AS k_int,
    5.0 AS k_float
  UNION
  ALL
  SELECT
    13 AS x_int,
    13.0 AS x_float,
    -5 AS k_int,
    -5.0 AS k_float
  UNION
  ALL
  SELECT
    -13 AS x_int,
    -13.0 AS x_float,
    -5 AS k_int,
    -5.0 AS k_float
)
SELECT
  (x_int / k_int)
FROM
  table_0

@max-sixty @aljazerzen Do you think we can add a sort at the end of every PRQL query so that even systems like ClickHouse match the results?

@max-sixty
Copy link
Member

@max-sixty @aljazerzen Do you think we can add a sort at the end of every PRQL query so that even systems like ClickHouse match the results?

Yes I think that's OK — it doesn't limit any SQL functionality. Feel free to merge this with most queries excluded and then follow-up with changed queries unexcluded.

I also think we could have the Unsupported dialects look for e.g. # run_clickhouse, rather than the absence of e.g. # skip_clickhouse, if that lets us add queries without worrying about the tail of dialects. Supported dialects would still require explicit skipping.

(Though maybe that's harder than just adding # skip_clickhouse to the others, feel free to do whatever is easier)

Copy link
Member

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@@ -81,11 +81,12 @@ impl Dialect {
| Dialect::SQLite
| Dialect::Postgres
| Dialect::MySql
| Dialect::MsSql => SupportLevel::Supported,
| Dialect::MsSql
| Dialect::ClickHouse => SupportLevel::Supported,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how many exclusions we need, we could move this down a rung. (Probably we should do that for MsSql too)

@eitsupi
Copy link
Member Author

eitsupi commented Jun 12, 2023

Note: ClickHouse doesn't have lag/lead functions.
https://clickhouse.com/docs/en/sql-reference/window-functions

@eitsupi
Copy link
Member Author

eitsupi commented Jun 13, 2023

The problem that we are currently experiencing is that the results of the following query.

from tracks
sort milliseconds
select display = case {
composer != null => composer,
genre_id < 17 => 'no composer',
true => f'unknown composer'
}
take 10

This is not a connection problem, as the following SQL via MySQL CLI reproduces it.

WITH table_0 AS (
  SELECT
    CASE
      WHEN composer IS NOT NULL THEN composer
      WHEN genre_id < 17 THEN 'no composer'
      ELSE 'unknown composer'
    END AS display,
    milliseconds
  FROM
    tracks
  ORDER BY
    milliseconds
  LIMIT
    10
)
SELECT
  display
FROM
  table_0
ORDER BY
  milliseconds
+---------------+
| display       |
+---------------+
| Samuel Rosa   |
|               |
|               |
|               |
| L. Muggerud   |
|               |
| L. Muggerud   |
|               |
| Gilberto Gil  |
| Chico Science |
+---------------+

@eitsupi
Copy link
Member Author

eitsupi commented Jun 13, 2023

I expected it to be a case when issue, but the following example works.......

from [
    {x = '3'},
    {x = '1'},
]
select x = case {
    x == '3' => 'fiz',
    true => x,
}
MySQL [dummy]> WITH table_0 AS (
    ->   SELECT
    ->     '3' AS x
    ->   UNION
    ->   ALL
    ->   SELECT
    ->     '1' AS x
    -> )
    -> SELECT
    ->   CASE
    ->     WHEN x = '3' THEN 'fiz'
    ->     ELSE x
    ->   END AS x
    -> FROM
    ->   table_0;
+------+
| x    |
+------+
| fiz  |
| 1    |
+------+
2 rows in set (0.025 sec)

@eitsupi
Copy link
Member Author

eitsupi commented Jun 13, 2023

Oh, I understand. Perhaps this is being read as an empty character rather than NULL...

WITH table_0 AS (
  SELECT
    CASE
      WHEN composer != '' THEN composer
      WHEN genre_id < 17 THEN 'no composer'
      ELSE 'unknown composer'
    END AS display,
    milliseconds
  FROM
    tracks
  ORDER BY
    milliseconds
  LIMIT
    10
)
SELECT
  display
FROM
  table_0
ORDER BY
  milliseconds
+------------------+
| display          |
+------------------+
| Samuel Rosa      |
| no composer      |
| no composer      |
| no composer      |
| L. Muggerud      |
| no composer      |
| L. Muggerud      |
| unknown composer |
| Gilberto Gil     |
| Chico Science    |
+------------------+

@eitsupi
Copy link
Member Author

eitsupi commented Jun 13, 2023

https://clickhouse.com/docs/en/operations/settings/formats#format_csv_null_representation
-> Seems not related to...

It seems to be correctly recognized as NULL when reading csv. Perhaps a problem with the table type definition?
https://clickhouse.com/docs/en/sql-reference/data-types/nullable

> SELECT composer FROM file('/var/lib/clickhouse/user_files/chinook/tracks.csv') limit 5;
+------------------------------------------------------------------------+
| composer                                                               |
+------------------------------------------------------------------------+
| Angus Young, Malcolm Young, Brian Johnson                              |
| NULL                                                                   |
| F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman                    |
| F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman |
| Deaffy & R.A. Smith-Diesel                                             |
+------------------------------------------------------------------------+
5 rows in set (0.010 sec)

@eitsupi
Copy link
Member Author

eitsupi commented Jun 13, 2023

Ok, Nullable works...

> WITH table_0 AS (
    ->   SELECT
    ->     CASE
    ->       WHEN composer IS NOT NULL THEN composer
    ->       WHEN genre_id < 17 THEN 'no composer'
    ->       ELSE 'unknown composer'
    ->     END AS display,
    ->     milliseconds
    ->   FROM
    ->     tracks
    ->   ORDER BY
    ->     milliseconds
    ->   LIMIT
    ->     10
    -> )
    -> SELECT
    ->   display
    -> FROM
    ->   table_0
    -> ORDER BY
    ->   milliseconds;
+------------------+
| display          |
+------------------+
| Samuel Rosa      |
| no composer      |
| no composer      |
| no composer      |
| L. Muggerud      |
| no composer      |
| L. Muggerud      |
| unknown composer |
| Gilberto Gil     |
| Chico Science    |
+------------------+
10 rows in set (0.011 sec)

@eitsupi eitsupi marked this pull request as ready for review June 13, 2023 04:18
@eitsupi
Copy link
Member Author

eitsupi commented Jun 13, 2023

It seems working now 🎉

@eitsupi eitsupi merged commit e1bc906 into PRQL:main Jun 13, 2023
@eitsupi eitsupi deleted the clickhouse-test-2 branch June 13, 2023 04:22
@max-sixty
Copy link
Member

Great work!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClickHouse integration test
2 participants