Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Apply RTRIM on string column when generating partitions with -tsp #1182

Merged
merged 2 commits into from
Jul 2, 2024

Conversation

nehanene15
Copy link
Collaborator

Closes Issue #1179

Applies RTRIM on string PKs when generating partitions with the --trim-string-pks flag. With this, the YAML configs will have trimmed whitespace from the string value in the WHERE clause. i.e

  filters:
  - source: ' ( "name" < ''Bob'' ) OR ( ( "name" = ''Bob'' ) AND ( "id" < 2 ) )'
    target: ' ( `name` < ''Bob'' ) OR ( ( `name` = ''Bob'' ) AND ( `id` < 2 ) )'

@nehanene15 nehanene15 requested a review from a team as a code owner June 28, 2024 15:58
@nehanene15
Copy link
Collaborator Author

/gcbrun

@nehanene15
Copy link
Collaborator Author

Below are the results of first_keys_table.compile() here. This change only adds and RTRIM to the string PK columns

WITH t0 AS (
  SELECT rtrim(t2."name") AS "name", t2."id",
         (row_number() OVER (ORDER BY t2."name" ASC, t2."id" ASC) - 1) + 1 AS "dvt_pos_num"
  FROM udfs.pk_bug t2
)
SELECT t1.*
FROM (
  SELECT t0.*
  FROM t0
  WHERE (((t0."dvt_pos_num" * 2) - (floor((t0."dvt_pos_num" * 2) / 3) * 3)) <= 2) AND
        (((t0."dvt_pos_num" * 2) - (floor((t0."dvt_pos_num" * 2) / 3) * 3)) > 0)
) t1
ORDER BY t1."name" ASC, t1."id" ASC

With -tsp:

WITH t0 AS (
  SELECT t2."name", t2."id",
         (row_number() OVER (ORDER BY t2."name" ASC, t2."id" ASC) - 1) + 1 AS "dvt_pos_num"
  FROM udfs.pk_bug t2
)
SELECT t1.*
FROM (
  SELECT t0.*
  FROM t0
  WHERE (((t0."dvt_pos_num" * 2) - (floor((t0."dvt_pos_num" * 2) / 3) * 3)) <= 2) AND
        (((t0."dvt_pos_num" * 2) - (floor((t0."dvt_pos_num" * 2) / 3) * 3)) > 0)
) t1
ORDER BY t1."name" ASC, t1."id" ASC

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the right fix. I also tested this on a Teradata to BQ example to be doubly sure. The following command data-validation generate-table-partitions -sc teradata -tc bq -hash '*' -pk course_id,quarter_id,student_id -cdir patch -pn 3 -tsp -tbls=udfs.issue1179=pso_data_validator.test_generate_partitions produces correct results with -tsp and inappropriate results without it.

I discovered a different bug in the process of trying this out see - Open
generate table partitions replaces whitespace in primary keys with a single space

Good work.

Sundar Mudupalli

@nehanene15 nehanene15 merged commit 9dcaad1 into develop Jul 2, 2024
5 checks passed
@nehanene15 nehanene15 deleted the issue1179-whitespace branch July 2, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants