Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default collation of tables when creating in JDBC DBs, make them be consistent with others. #336

Merged
merged 3 commits into from
Oct 14, 2021

Conversation

thongdk8
Copy link
Contributor

@thongdk8 thongdk8 commented Oct 4, 2021

Currently in MySQL, SQLServer the default collation is not followed binary code point comparison sort which is inconsistent with other JDBC DBs and other storage as well. This PR was made for fixing it. PTAL! Thank you!

@thongdk8 thongdk8 added the bugfix label Oct 4, 2021
@thongdk8 thongdk8 self-assigned this Oct 4, 2021
@thongdk8 thongdk8 changed the title Fix default collation of tables when creating in JDBC DBs, make them be consistent with others. [WIP]Fix default collation of tables when creating in JDBC DBs, make them be consistent with others. Oct 4, 2021
@thongdk8 thongdk8 changed the title [WIP]Fix default collation of tables when creating in JDBC DBs, make them be consistent with others. Fix default collation of tables when creating in JDBC DBs, make them be consistent with others. Oct 4, 2021
@thongdk8 thongdk8 marked this pull request as ready for review October 4, 2021 12:05
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for finding this issue! I left one question about the sqlserver part. I think the MySQL part is fine. Thanks.

@@ -78,7 +78,7 @@
ImmutableMap.<DataType, String>builder()
.put(DataType.INT, "INT")
.put(DataType.BIGINT, "BIGINT")
.put(DataType.TEXT, "VARCHAR(8000)")
.put(DataType.TEXT, "VARCHAR(8000) COLLATE Latin1_General_BIN")
Copy link
Collaborator

@brfrn169 brfrn169 Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use utf8 encoding instead of latin1?

Copy link
Contributor Author

@thongdk8 thongdk8 Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't found the collation in SQL Server that contains both utf8, and bin posfix for binary comparison sort.

Copy link
Contributor Author

@thongdk8 thongdk8 Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the list of all collations with binary sort order in SQL Server I found. https://docs.google.com/spreadsheets/d/1j7-5lM6tRR8O-YJErc4b3iyfk3HogNj_qNlPqFrN7os/edit?usp=sharing

@thongdk8
Copy link
Contributor Author

thongdk8 commented Oct 13, 2021

Summary after discussing with Microsoft Support Team:

  • SQL Server currently doesn't have a collation that supports the whole utf8 character set.
  • We have to specify the language character set, for example, Latin..., "Japanese...
    When inserting for example the collation is Latin1_General_100_BIN2_UTF8 and we insert Japanese characters, it will become ???... in the database

@brfrn169
Copy link
Collaborator

@thongdk8 Thank you for summarizing the discussion with Microsoft Support Team! So it seems like it's better to make the SQLServer collation configurable so that users can select it. And I think we can do that in another PR. I think we can merge this PR for now. Thanks.

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the investigation! LGTM!

@brfrn169 brfrn169 merged commit 610ad2f into master Oct 14, 2021
@brfrn169 brfrn169 deleted the fix-case-sensitive-jdbc-tables branch October 14, 2021 01:52
brfrn169 pushed a commit that referenced this pull request Oct 31, 2021
brfrn169 pushed a commit that referenced this pull request Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants