diff --git a/docs/categories-of-data-quality-checks/how-to-detect-blank-and-whitespace-values.md b/docs/categories-of-data-quality-checks/how-to-detect-blank-and-whitespace-values.md index 31fd47a6ac..8059c98996 100644 --- a/docs/categories-of-data-quality-checks/how-to-detect-blank-and-whitespace-values.md +++ b/docs/categories-of-data-quality-checks/how-to-detect-blank-and-whitespace-values.md @@ -2,7 +2,7 @@ title: How to detect whitespace and null value placeholders --- # How to detect whitespace and null value placeholders -Read this guide to learn how to detect data quality issues in text columns containing spaces, tabs, or special texts equivalent to a null value. +Read this guide to learn how to detect whitespaces, such as spaces, tabs, or special texts equivalent to a null value in text columns using SQL checks. The data quality checks for detecting whitespace and empty value placeholders are configured in the `whitespace` category in DQOps. @@ -233,6 +233,107 @@ The reference section provides YAML code samples that are ready to copy-paste to the parameters reference, and samples of data source specific SQL queries generated by [data quality sensors](../dqo-concepts/definition-of-data-quality-sensors.md) that are used by those checks. +## FAQ +The questions and answers for popular questions related to detecting whitespace characters. + +### What is a whitespace character +A whitespace character is any character that represents a blank space, such as a regular space, a tab, or a new line. +These characters are often invisible to the human eye but can affect how data is processed and interpreted by databases. +When users are previewing column values, the columns that contain values ending with whitespace look the same as trimmed values. +However, the values don't match, and a query that uses a filter in the WHERE clause in SQL will not find all rows. + +### What is the valid name, "whitespace" or "white space" +Both "whitespace" and "white space" are used, but "whitespace" is generally considered the more technically correct and contemporary term, especially in computing and data management contexts. +You'll find it used more frequently in technical documentation and programming. + +### How to check if a column value has space in SQL +To check for spaces in a SQL column, use the LIKE operator (e.g., column_name `LIKE '% %'`) or remove spaces with TRIM and compare the result to the original column +(e.g., `WHERE TRIM(column_name) <> column_name`). +Remember that spaces are just one type of whitespace. For a thorough check, consider your database's specific functions or regular expressions. + +### How to check if a column value has space in SQL Server +In SQL Server, you can use all generic methods of detecting whitespace characters mentioned in the previous answer. +SQL Server also offers specialized functions like `CHARINDEX` to locate the position of a space within a string, giving you more precise control. + +The following query will find a space character using TransactSQL in SQL Server: + +```sql +SELECT column_name +FROM your_table +WHERE CHARINDEX(' ', column_name) > 0 +``` + +### What is the difference between using IS NULL or finding a whitespace in SQL? +`IS NULL` checks if a column value is explicitly defined as *NULL*, meaning it has no value at all. This is a specific state recognized by the database. +On the other hand, finding a whitespace (using `LIKE`, `TRIM`, etc.) means looking for characters like spaces, tabs, or newlines that represent "blank" space. +These are actual characters stored in the database, even though they might appear invisible. + +Here's why this difference matters: + +* **Database Storage**: Some databases, notably **Oracle**, have a unique behavior. They might not store truly empty string values. + Instead, they represent them as *NULL* values. This can lead to unexpected results if you're searching for empty strings but the database treats them as NULLs. + +* **Configuration**: Both Oracle and SQL Server allow you to configure how the database handles comparisons between empty strings and *NULL* values. + This setting can affect query results and indexing. For instance, in Oracle, you can use the `ANSI_NULLS` setting to control this behavior. + In SQL Server, this is influenced by the `SET ANSI_NULLS` option. + +* **Sorting and Indexing**: Empty strings and *NULL* values are treated differently in sorting operations and index construction. + This can impact the performance and organization of your data. + +### How PostgreSQL handles NULL or empty value in GROUP BY? +PostgreSQL has a clear and consistent way of handling *NULL* and empty values in `GROUP BY`: + +* *NULL* values are grouped together: If a column has multiple *NULL* values, they will be treated as a single group in the `GROUP BY` result. + +* Empty strings are grouped together: Similar to *NULL*s, all empty strings (`''`) are considered the same and grouped into a single group. + +* *NULL* and empty strings are distinct: Importantly, PostgreSQL distinguishes between *NULL* and empty strings. They are treated as separate groups in the `GROUP BY` clause. + +### How to represent a tab character in SQL? +To represent a tab character in SQL, use `CHAR(9)` function. The ASCII code for a tab character is 9. +Some databases may allow you to directly insert a tab character using the Tab key or an escape sequence like `\t`, but CHAR(9) is the most reliable method in databases +that support this function. + +Here are the examples of using the `CHAR` or similar functions in the most popular databases: + +* **SQL Server**: `CHAR()` is a standard function. +* **Oracle**: Oracle uses `CHR()` to achieve the same result. +* **PostgreSQL**: `CHAR()` is fully supported. +* **MySQL**: `CHAR()` is a standard function. +* **IBM DB2**: `CHAR()` is part of the standard SQL functions. +* **SQLite**: `CHAR()` is available. + +### How to remove whitespaces around a text in SQL? +To remove whitespaces around text in SQL, you can use the `TRIM()` function. It removes leading and trailing whitespace characters (spaces, tabs, newlines, etc.) from a string. + +Here's a simple example: + +```sql +SELECT TRIM(' This has extra spaces. ') +``` + +This will return: + +```text +This has extra spaces. +``` + +### How to remove all spaces inside a text using SQL? +While `TRIM()` function removes spaces around text, to remove all spaces within the text, you'll need a different approach. Most databases offer a function like `REPLACE()`. +This function replaces all occurrences of a specified character with another character. In this case, you'd replace all spaces (`' '`) with an empty string (`''`). + +Here's how it works: + +```sql +SELECT REPLACE(' This has extra spaces. ', ' ', '') +``` + +This will return: + +```text +Thishasextraspaces. +``` + ## What's next - Learn how to [run data quality checks](../dqo-concepts/running-data-quality-checks.md#targeting-a-category-of-checks) filtering by a check category name - Learn how to [configure data quality checks](../dqo-concepts/configuring-data-quality-checks-and-rules.md) and apply alerting rules