Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update Teradata rtrim() to account for tabs and other whitespace #1273

Merged
merged 4 commits into from
Sep 20, 2024

Conversation

nehanene15
Copy link
Collaborator

Closes #1272

This updates the default RTRIM(column, " ") for Teradata so that it also accounts for tabs, newlines, and other whitespace characters.

@nehanene15 nehanene15 requested a review from a team as a code owner September 18, 2024 15:34
@nehanene15
Copy link
Collaborator Author

nehanene15 commented Sep 18, 2024

/gcbrun

@sundar-mudupalli-work
Copy link
Collaborator

Hi,

I suggest adding this to our test suite as follows:
Change this line to

,(3,3,3,3,3
,12345678901234567890,1234567890123456789012345,123.3,123456.3,12345678.3
,'Hello DVT','C ','Hello DVT' || _latin '20090B0A0D0C'XCV
,DATE'1970-01-03',TIMESTAMP'1970-01-03 00:00:03'
,CAST('1970-01-03 00:00:03.000-03:00' AS TIMESTAMP(3) WITH TIME ZONE)

This confirms that RTRIM on the Teradata end strips the white space characters at the end before comparison with BQ. Change this line to

,(2,2,2,2,2
 ,NUMERIC '12345678901234567890'
 ,BIGNUMERIC '1234567890123456789012345'
 ,NUMERIC '123.22',123456.2,12345678.2
 ,'Hello DVT','B ','Hello DVT \t\v\n\r\f''
 ,DATE '1970-01-02',DATETIME '1970-01-02 00:00:02'
 ,TIMESTAMP '1970-01-02 00:00:02-02:00')

This confirms that RTRIM on BigQuery removes the whitespace characters prior to comparison with other engines. We could do this for other engines - but this is a good start for Teradata and BigQuery.

Thanks.

Sundar Mudupalli

@nj1973
Copy link
Contributor

nj1973 commented Sep 18, 2024

Remember that dvt_core_types is used for column validation testing (and perhaps comparison fields too), we need to be sure we don't break those tests. Therefore we would need to add the extra characters across all engines because they all get compared to BigQuery, which is no big deal, just pointing it out.

@nehanene15
Copy link
Collaborator Author

We already have a table in TD for testing CHAR PKs with --trim-string-pks. I can add the tabs in that table so it won't disturb the core_types table.

@sundar-mudupalli-work
Copy link
Collaborator

Neil - you are right. In column validation, we use the string length for min, max and sum, so it can create an issue - especially if the dvt_core_types for other databases are not updated. Between Teradata and BQ, the min/max/sum may remain the same - though I had not considered that when I first proposed this.

Thank you.

Sundar Mudupalli

@nehanene15
Copy link
Collaborator Author

@sundar-mudupalli-work I added extra tabs and whitespace to our existing char_id table which tests the use of -tsp and CHAR column validation. Before this fix, the tabs would have broken the hash validation and marked the rows as failures.

Copy link
Collaborator

@helensilva14 helensilva14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@sundar-mudupalli-work sundar-mudupalli-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. You found a way to work around character sets without the ugly hex codes!

@nehanene15 nehanene15 merged commit db7e9d3 into develop Sep 20, 2024
5 checks passed
@nehanene15 nehanene15 deleted the issue1272-tsp-tabs branch September 20, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Teradata Fails removing tabs using rtrim()
4 participants