You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
For example, csv reader of float32 sometimes shows 6 digits when the column in csv file only has 4 digits. The extra digits are mostly 00, 01 or 99 in the end.
Steps/Code to reproduce bug
When reading a sample csv file as follows:
"For sensitive data like finance data, the dataframe has to be exactly the same as the csv file."
It is not actually possible to store exact decimal values in floating point. Binary numbers (like float32) cannot exactly represent all decimal numbers. For example, try entering 8.6093 into the Decimal Representation box on this site: https://www.h-schmidt.net/FloatConverter/IEEE754.html
You will see that the number actually stored in float32 is 8.60929965972900390625, which when rounded to 6 decimal places is 8.609301.
The issue here is just a matter of formatting the output -- Pandas formats to 4 digits while cudf is formatting to 6 digits. Perhaps we should change the default formatting to match.
In any case, this has nothing to do with read_csv and is just an output formatting issue. @kkraus14 may have an opinion on what action should be taken.
I see. Thank you. Yeah, I tried different csv file and confirmed it is just a format thing and it is also consistent. In my opinion, maybe no action is needed.
Describe the bug
For example, csv reader of float32 sometimes shows 6 digits when the column in csv file only has 4 digits. The extra digits are mostly
00
,01
or99
in the end.Steps/Code to reproduce bug
When reading a sample csv file as follows:
The code is as follows:
The output is as follows and the mismatch is in the last row of
var_0
.Expected behavior
The same output as pandas and as the input csv file.
Environment details (please complete the following information):
0.6.1+0.gbeb4ef3.dirty
.Additional context
For sensitive data like finance data, the dataframe has to be exactly the same as the csv file.
The text was updated successfully, but these errors were encountered: