Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace python based integration test with sqllogictest #4462

Closed
Tracked by #4460
alamb opened this issue Dec 1, 2022 · 5 comments · Fixed by #4834
Closed
Tracked by #4460

Replace python based integration test with sqllogictest #4462

alamb opened this issue Dec 1, 2022 · 5 comments · Fixed by #4834
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed sqllogictest SQL Logic Tests (.slt)

Comments

@alamb
Copy link
Contributor

alamb commented Dec 1, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We have a great integration test from @jimexist 🦾 https://github.com/apache/arrow-datafusion/tree/master/integration-tests which runs a limited number of queries against data in both postgres and datafusion and compares the results

The major downside is that it does not get updated with new coverage very often. I believe this is due to two factors:

  1. It is not run with cargo test
  2. It requires additional system setup above and beyond for developing DataFusion (postgres, python, datafusion-python, etc)

Describe the solution you'd like
I would like to port all the existing coverage in integration-test to the sqllogictest framework aded in #4395 and remove the python based version

In order to run the same tests against postgres, we could reuse some of the code from sqllogictest-bin to implement a postgres driver: https://github.com/risinglightdb/sqllogictest-rs/blob/main/sqllogictest-bin/src/engines/postgres.rs

I think we would likely have to add some sort of annotation to the tests like

# compare_to(postgres)

And then not run tests annotated like that by default

Describe alternatives you've considered
Keep the existing

Additional context
Add any other context or screenshots about the feature request here.

@alamb alamb added the enhancement New feature or request label Dec 1, 2022
@alamb alamb mentioned this issue Dec 1, 2022
28 tasks
@alamb alamb added the help wanted Extra attention is needed label Dec 1, 2022
@alamb
Copy link
Contributor Author

alamb commented Dec 1, 2022

I think this would be a nice project for someone to get to know DataFusion better -- I think it would involve learning about sqllogictest-rs and porting the existing code from there.

@melgenek
Copy link
Contributor

melgenek commented Jan 6, 2023

Hi there! I am new to Datafusion and would like to learn more about it.
And I found this issue, which I'd like to tackle.

I did some investigation regarding sqllogictest, sqllogictest-rs and datafusion. And I now see some ways to implement a solution.

My main question is: what level of compatibility between Postgres and Datafusion would you like to check?
Currently, the tests in integration-test compare values within an error using np.testing.assert_allclose.
But while values are close, they are not always fully equal in terms of representation.
For example, the test simple_window_partition_aggregation.sql result has an avg value which Postgres represents as -13.8571428571428571, but Datafusion does as -13.857142857142858. The values are close, but not in a text form representation.


Regarding implementation I imagine some ways:

1. The very basic way without changes to sqllogictest-rs would be to define sqllogic tests with results explicitly. Each test would be created twice: once for Postgres, and once for Datafusion. Tests would differ

For example,

onlyif postgres
query R
select 2.0;
----
2.0

onlyif datafusion
query R
select 2.0;
----
2

Pros:

  • each output could be different. For example, as in the example above the values are the same, though their representation is different.
  • this doesn't change the idea of sqllogictest, and uses out-of-the-box features

Cons:

  • all outputs have to be created manually.
  • in many cases there would be just duplicate queries and duplicate sets of results.

2. Another option that uses existing features is:

  • create a test.slt test without outputs
  • run Postgress to complete the results. Do it the same way the --complete parameter works now. Output this file into a test.tmp.slt
  • Run Datafusion over test.tmp.slt

Pros:

  • no new sqllogictest-rs features required

Cons:

  • there only way to compare results is to compare their text representations

3. Use query labels to compare of labeled queries, but run multiple database engines at the same time.

The queries would look like

onlyif postgres
query R nosort q42
select 2.0;

onlyif datafusion
query R nosort q42
select 2.0;

Pros:

  • results are generated by sqllogictest. There is no need to define them manually
  • there is no new sqllogictest syntax features
  • the tool would be pretty generic. It could be reused for different database combinations

Cons:

  • sqllogictest-rs currently doesn't support comparison of labeled queries. This has to be implemented.
  • this is not really an intended use of onlyif. The sqllogictest in general and its implementations suppose that there is one database at a time. The same suite of tests can run on many different databases, but not at the same time. In my suggesting, we suppose that there are 2 databases. This fast is not necessarily bad but stretches the current idea of sqllogictest a little
  • there is no way to define comparison rules, like in option 2

4. Create an sqllogictest::Runner with a custom Validator.

The Validator is currently defined as

pub type Validator = fn(actual: &[Vec<String>], expected: &[String]) -> bool;

If Validator is updated to contain the original query record, then implementation can actully run Postgres, and disregard whatever expected value is passed to the validator.

pub type Validator = fn(record: Record, actual: &[Vec<String>], expected: &[String]) -> bool;

...
fn postgres_validator(record: Record, actual: &[Vec<String>], expected: &[String]) {
// let postgres_result = call_postgres(record.sql);
// postgres_result == actual
}

Pros:

  • relatively small number of changes to sqllogictest-rs
  • provides a more elaborate Validator that could be used for other purposes

Cons:

  • comparing float values with error would still require more changes. sqllogictest-rs treats results in actual/expected field as String now. So there is no way to do number rounding. Another enum, for example, DBOutput has to be introduced instead of String to carry type information. See this issue for a discussion of non-text comparisons.
  • value Validator would likely need to be merged with a type validator. Otherwise, there could be a need to call Postgress twice: in value validation, and in type validation

5. Introduce a way to run a statement/query on multiple databases at the same time.

This can be either achieved by using labels

query R nosort postgres datafusion
select 2.0;

or the way that is similar to what was proposed in the description of this

# compare(datafusion, postgres)
query R
select 2.0;

Pros:

  • this is a new generic feature that would allow having multiple databases at the same time

Cons:

  • sqllogictest-rs has to be updated to support having multiple runners
  • # compare(datafusion, postgres) would require updating not only execution, but also parsing logic

To summarize my understanding of sqllogictest:

  • tests check text representations of results
  • if comparing of same, but not exactly equal value is required, then they have to be transformed to the same text representation. For example, comparing floating numbers could require using sql functions like round or truncate.

In order to demonstrate how Postgres compatibility could look like using sqllogictest I created an implementation based on the 2nd approach #4834.

I would be glad to hear some feedback regarding my pr and thoughts about which way to proceed.
Thank you in advance!

@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2023

Hi @melgenek

Hi there! I am new to Datafusion and would like to learn more about it.

That is great! Welcome!

My main question is: what level of compatibility between Postgres and Datafusion would you like to check?

I think for this ticket we should strive for "the same level as is currently verified using the python integration tests" as much as possible.

The values are close, but not in a text form representation.

yes this is a classic "floating point rounding error" type situation and why it is typically not a great idea to directly compare floating point values.

In terms of how to implement this, would something like this work (initially):

  1. For some specific files (maybe those that start with postgres_vaildated_ the datafusion sqllogic-rs test runner would run the file both against postgres and against datafusion and compare the results. This file might initially only contain the existing queries in the python based integration test but we could expand it over time.

  2. For both datafusion and postgres, apply some sort of normalization to floating point values prior to printing (round them to a smaller number of significant figures https://github.com/apache/arrow-datafusion/blob/3cc607de4ce6e9e1fd537091e471858c62f58653/datafusion/core/tests/sqllogictests/src/normalize.rs#L78-L95, for example). Maybe we could add that to the compare directive

I also left some feedback on #4834 (review) -- does that make sense?

@melgenek
Copy link
Contributor

melgenek commented Jan 7, 2023

I also left some feedback on #4834 (review) -- does that make sense?

Thank you for the feedback.

For both datafusion and postgres, apply some sort of normalization to floating point values prior to printing (round them to a smaller number of significant figures

I think this is a good suggestion. I'll update the Postgres/Datafusion clients to have a normalization step for results.


In order to proceed further, I'd like to clarify the way how you'd like to define .slt tests.
In #4834 I implemented the tests similar to the existing integragration-tests: there are only queries and results are generated and compared by the test runner.

Regular .slt are supposed to have expected outputs. And in this comment #4834 (comment) you're suggesting having the same approach for Postgres compatibility tests.

Having explicit results is a totally viable approach, especially having --complete option for running sqllogictests.

So do we agree, that the way to go is to force all .slt tests to have expected outputs, and then simply verify the same tests with Postgres/Datafusion runnens?

@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2023

So do we agree, that the way to go is to force all .slt tests to have expected outputs, and then simply verify the same tests with Postgres/Datafusion runnens?

Yes, that would be my preferred solution -- it will ensure that DataFusion is producing consistent answers with Postgres and that the answers can also be verified by looking at the slt file

Just so it is clear, I do not expect every single .slt file now (or ever) to produce the same answers as postgres (as datafusion supports different syntax and likely also has existing discrepancies, etc). I think we should start simple with a few explicitly marked files used to do "with postgres verification" and we can expand the scope of which files are tested against slt files over time.

xudong963 pushed a commit that referenced this issue Jan 21, 2023
* Move Datafusion engine to a package

* Run postgres and datafusion tests and compare outputs

* clippy and fmt

* Remove unused line

* Floating types rounding

* Make basic types compatible

* Format with BigDecimal to keep arbitrary precision.

* Fill postgres queries with results

* Run Postgres separately from DataFusion.
Merge slt files

* fmt

* clippy

* Postgres step in github actions

* Run sqllogictest with Postgres on plain ubuntu

* Update sqllogictest readme

* Revert unnecessary change

* Add license headers

* Don't log PG_COMPAT value

* clippy and fmt

* remove whitespaces in readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants