-
Notifications
You must be signed in to change notification settings - Fork 131
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
replicators: add collation support for CHAR and BINARY columns
This commits adds proper collation support for CHAR and BINARY columns in MySQL. CHAR columns should be right padded with spaces to the column length when storing them and BINARY should right pad zeros. This commit fixes the issue at snapshot - During snapshot we do a logical dump of data. MySQL removes padding spaces from CHAR columns when retrieving them. So, we need to take the column collation into consideration when storing them. One gotcha is with ENUM/SET columns, they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not pad them. During CDC, we need to retrieve proper metadata from TME in order to validate if padding is necessary or not. This commit also fixes an issue when storing BINARY columns. We were storing them as TinyText/Text if the binary representation of the columns was a valid UTF-8 string. This is not correct. We should store them as ByteArray. Test cases were written taking into consideration a mix of characters from different bytes, like mixing ASCII and UTF-8 characters from 2nd and 3rd bytes. Note: MySQL uses the terminology of charset and collation interchangeably. In the end everything is stored as collation ID, which can be used to determine the charset and collation. Ref: REA-4366 Ref: REA-4383 Closes: #1247 #1259 Release-Note-Core: Added collation support for storing CHAR and BINARY columns in MySQL is correct padding. This fixes and issue when looking up CHAR/BINARY columns with values that do not match the column length. Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
- Loading branch information
1 parent
c44e734
commit d928dd8
Showing
6 changed files
with
474 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
mod connector; | ||
mod snapshot; | ||
mod utils; | ||
|
||
pub(crate) use connector::MySqlBinlogConnector; | ||
pub(crate) use snapshot::MySqlReplicator; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
use std::sync::Arc; | ||
|
||
use mysql_common::collations::{self, Collation, CollationId}; | ||
use mysql_srv::ColumnType; | ||
use readyset_data::DfValue; | ||
|
||
/// Pad a MYSQL_TYPE_STRING (CHAR / BINARY) column value to the correct length for the given column | ||
/// type and charset. | ||
/// | ||
/// Parameters: | ||
/// - `val`: The current column value as a vector of bytes. | ||
/// - `col`: The column type. | ||
/// - `collation`: The collation ID of the column. | ||
/// - `col_len`: The length of the column in bytes. | ||
/// | ||
/// Returns: | ||
/// - A `DfValue` representing the padded column value - `CHAR` will return a `TinyText` or `Text` | ||
/// and `BINARY` will return a `ByteArray`. | ||
pub fn mysql_pad_collation_column( | ||
val: &Vec<u8>, | ||
col: ColumnType, | ||
collation: u16, | ||
col_len: usize, | ||
) -> DfValue { | ||
assert_eq!(col, ColumnType::MYSQL_TYPE_STRING); | ||
let collation: Collation = collations::CollationId::from(collation).into(); | ||
match collation.id() { | ||
CollationId::BINARY => { | ||
if val.len() < col_len { | ||
let mut padded = val.clone(); | ||
padded.extend(std::iter::repeat(0).take(col_len - val.len())); | ||
return DfValue::ByteArray(Arc::new(padded)); | ||
} | ||
DfValue::ByteArray(Arc::new(val.to_vec())) | ||
} | ||
_ => { | ||
let column_length_characters = col_len / collation.max_len() as usize; | ||
let mut str = String::from_utf8_lossy(val).to_string(); | ||
let str_len = str.chars().count(); | ||
if str_len < column_length_characters { | ||
str.extend(std::iter::repeat(' ').take(column_length_characters - str_len)); | ||
} | ||
DfValue::from(str) | ||
} | ||
} | ||
} |
Oops, something went wrong.