Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] DBT Unit tests to support Redshift limitations such as LISTAGG, MEDIAN, PERCENTILE_CONT, etc. #807

Closed
3 tasks done
tanapoln opened this issue May 2, 2024 · 2 comments
Labels
feature:unit-tests Issues related to built-in dbt unit testing functionality type:enhancement New feature or request

Comments

@tanapoln
Copy link

tanapoln commented May 2, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The current DBT Unit tests feature doesn't support Redshift if the SQL in CTE contains functions such as LISTAGG, MEDIAN, PERCENTILE_CONT, etc. Those functions must be executed against a user-created-table. However, DBT combines given rows to be part of CTE, which is unsupported by Redshift. You can try SQL below.

create temporary table "test_tmpxxxxx" as (
   with test_fixture as (
       select
         cast(1000 as integer) as id,
         cast('menu1' as character varying(500)) as name,
         cast( 1 as integer) as quantity
      union all
      select
         cast(1001 as integer) as id,
         cast('menu2' as character varying(500)) as name,
         cast( 1 as integer) as quantity
      union all
      select
         cast(1003 as integer) as id,
         cast('menu1' as character varying(500)) as name,
         cast( 1 as integer) as quantity
   ),
   agg as (
      SELECT
         LISTAGG(name || ' x ' || quantity, ',') AS option_name_list,
         id
      FROM test_fixture
      GROUP BY id
   )
   select * from agg
);

This will error [XX000] ERROR: One or more of the used functions must be applied on at least one user created tables. Examples of user table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT, etc

But the query below works fine:

create temporary table "test_tmp1234" as (
   SELECT
      cast(1000 as integer) as id,
      cast('menu1' as character varying(500)) as name,
      cast( 1 as integer) as quantity
   union all
   select
      cast(1001 as integer) as id,
      cast('menu2' as character varying(500)) as name,
      cast( 1 as integer) as quantity
   union all
   select
      cast(1000 as integer) as id,
      cast('menu1' as character varying(500)) as name,
      cast( 1 as integer) as quantity
);

with agg as (
   SELECT
      LISTAGG(name || ' x ' || quantity, ',') AS option_name_list,
      id
   FROM test_tmp1234
   GROUP BY id
)
select * from agg;

As I mentioned, if all given rows are created as temporary table first, then run the test by referring to it, it will be fine.

This is a feature request to separate DBT unit tests into 2 steps

  1. Prepare test fixtures by creating temporary tables
  2. Run unit test query by referring to the temporary tables

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

@tanapoln tanapoln added type:enhancement New feature or request triage:product labels May 2, 2024
@dbeatty10 dbeatty10 added the feature:unit-tests Issues related to built-in dbt unit testing functionality label May 2, 2024
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core May 2, 2024
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 5, 2024

Thanks @tanapoln! Sorry for the delay getting back to you.

Indeed, these functions are not supported in the current implementation of dbt unit tests on Redshift.

There was a bit of confusion with the docs update we made to clarify that lack of support, which presented your reproduction case as a viable workaround, when really it's an explanation of why this does not work:

In order to get this working, I think we'd need to add support for a "materialized" strategy (rather than "CTE") for unit tests:

This would also enable us to fully test incremental strategies, among other things. While it's very useful to have this issue so other users can search for it (thank you!), I'm going to consider its actual resolution blocked on implementing dbt-labs/dbt-core#8499.

@dbeatty10
Copy link
Contributor

@jtcohen6 just noticed the following in the notes for Amazon Redshift patch 182:

  • Updates LISTAGG, MEDIAN, PERCENTILE_CONT and PERCENTILE_DISC to no longer require user-defined tables. Queries that reference catalog tables or that don't reference any tables can also use these functions.

So this might be fixed on Redshift side of things now 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:unit-tests Issues related to built-in dbt unit testing functionality type:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants