Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCS2 to UTF8MB4 Encoding error #1438

Open
dwash96 opened this issue Jul 24, 2024 · 0 comments
Open

UCS2 to UTF8MB4 Encoding error #1438

dwash96 opened this issue Jul 24, 2024 · 0 comments

Comments

@dwash96
Copy link

dwash96 commented Jul 24, 2024

Not sure if this is expected for fixed-length to variable length charset conversions but for rows read from the binlog into the gho table during the migration, the utility does not do an equivalent character conversion, it just uses the bytes given with the new charset. I could see why this behavior might make sense (charsets may not necessarily have truly equivalent representations of the same displayed character?) and is a skill issue in my part but in case this is not expected behavior, the example I saw during my migration was as follows:

"M" in UCS2: 0x004D
"M" in UTF8MB4: 0x4D

I am in the process of fixing it in my DB by running UPDATE statements in the format:

{column} = IF(CAST({column} AS BINARY) LIKE CONCAT(0x00, '%'), CONVERT(CONVERT(CAST({column} AS BINARY) USING UCS2), USING UTF8MB4), {column})

With some appropriate filters to not have to attempt to update every row of the new table of course. Is this something that gh-ost should handle itself? I think the format above allows specifically UCS2 to UTF8, if it were ran against the gho table rows upon insertion/update from the binlog but it's probably not portable outside this specific case.

Either way, this is a fantastic tool and I really appreciate that you all made this. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant