You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We have a parameter called na_filter which will decide whether there should be any nulls in the dataframe or not while reading a csv file content. This parameter seems to be non-functioning.
Expected behavior
When na_filter is False we shouldn't actually be having any nulls in the dataframe and rest of the empty values would be read as empty strings pandas does.
Environment overview (please complete the following information)
Environment location: [Bare-metal]
Method of cuDF install: [from source]
Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details
na_filter is partially functional - it disables the na_values from being detected as null/NaN. Looks like what is missing is the change in logic to treat empty fields as null. Good catch.
There may be more that can be done with this parameter, like creating non-nullable output columns.
vuule
changed the title
[BUG] csv reader parameter na_filter is non-functional
[BUG] csv reader - empty fields treated as null even with na_filter=FalseNov 5, 2020
Fixes#6682, #6680
Currently, empty fields are treated as N/A regardless on parsing options. However, the desired behavior is to handle empty fields the same way as fields with special values (apply default_na_values, na_filter logic).
This PR irons out the behavior so it matches Pandas in this regard.
- Tries now support matching empty strings.
- The list of special NA values is now generated more robustly, so it has correct elements in any parameter combination.
- Empty string is added to the list of special NA values.
- Empty string string ("/"/"") is added to NA value list if empty string ("") is included (mirrors Pandas behavior).
- Added tests for previously failing parameter combinations.
- Reworked some of the tests to check against Pandas results instead of assumed desired behavior.
Authors:
- vuule <vmilovanovic@nvidia.com>
- vuule <vukasin.milovanovic.87@gmail.com>
- Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com>
- Vukasin Milovanovic <vmilovanovic@nvidia.com>
Approvers:
- Ram (Ramakrishna Prabhu)
- Christopher Harris
- Keith Kraus
URL: #6922
Describe the bug
We have a parameter called
na_filter
which will decide whether there should be any nulls in the dataframe or not while reading a csv file content. This parameter seems to be non-functioning.Steps/Code to reproduce bug
Expected behavior
When
na_filter
isFalse
we shouldn't actually be having any nulls in the dataframe and rest of the empty values would be read as empty strings pandas does.Environment overview (please complete the following information)
Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context
Surfaced while running fuzz tests: #6001
The text was updated successfully, but these errors were encountered: