Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqllogictests doesn't cover cases if the column name is not expected. #6349

Open
alamb opened this issue May 14, 2023 · 3 comments
Open

Sqllogictests doesn't cover cases if the column name is not expected. #6349

alamb opened this issue May 14, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented May 14, 2023

Is your feature request related to a problem or challenge?

As we port our tests to sqllogictests #4460 it is important to retain test coverage

One thing that was noticed is that sqllogictest tests doesn't cover cases if the column name is not expected.

So for example, it will show the following outputs as the same

Int8
1
2

to

a
1
2

Describe the solution you'd like

It would be nice to have some way (perhaps optionally) that verified the column names as well

Describe alternatives you've considered

No response

Additional context

@comphead brought thus up on #6329 (review)

@alamb alamb added the enhancement New feature or request label May 14, 2023
@alamb
Copy link
Contributor Author

alamb commented May 14, 2023

@melgenek I wonder if you have any thoughts on this idea?

@melgenek
Copy link
Contributor

As far as I can tell, sqllogictest in general, and sqllogictest-rs do not support column name checks right now.

There are some ways to extend sqllogictest-rs to support columns. It seems that to add native support for column names one would need to do the following:

  1. add a colnames column check support to the query clause to optionally switch the check on/off, where off is the default. Column names could be the first row in the text representation
query II rowsort colnames
select 1 as one, 2 as two;
----
one two
1     2
  1. update DBOutput and RecordOutput to have a field column_names.
  2. update implementations of the sqllogictest::AsyncDB to return column names along with types and results
  3. introduce a validator for column names similar to the type and result validator. A default implementation would probably be a no-op implementation to prevent behavior changes for library users other than Datafusion. A real implementation would likely just lowecase names and compare strings.

On the other hand, one could make comparisons work on the Datafusion side.

It seems that the EXPLAIN statement already gives the alias names for projections, so it is possible to run the same query twice: once with EXPLAIN, once without. This way both names and values are checked. Of course, there is a lot more information than just names in an EXPLAIN output.

Another way is to create a more powerful version of an arrow_typeof. For example, Databricks has a DESCRIBE QUERY and Snowflake a DESCRIBE RESULT that show the expected output metadata in format similar to

  +---------+------------+
  |col_name |data_type   |
  +---------+------------+
  |one      | int        |
  |two      | bigint     |
  +---------+------------+

This way both specific Arrow types and names could be checked with one query.

@alamb
Copy link
Contributor Author

alamb commented May 15, 2023

Thank you @melgenek -- this is excellent advice.

We already have a version of this (though this also looks at the data) with the DESCRIBE dataframe method... Maybe we could connect it into SQL

https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.describe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants