Nullable integer data type bug in read_csv #32453

AlexanderSpirin · 2020-03-05T07:45:30Z

When reading csv file column with NA fields loose precision. I think the reason of the problem is the cast to float somewhere under hood.

/tmp/1.csv:

field1,field2
,1577134800018164904
1577134800018226901,1577134800018267738

A = pd.read_csv('/tmp/1.csv', dtype='Int64')
A

	field1	field2
0		1577134800018164904
1	1577134800018226944	1577134800018267738

int(float(1577134800018226901))
1577134800018226944

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.6.1.final.0
python-bits : 64
OS : Linux
OS-release : 3.16.0-77-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 28.8.0
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-03-10T14:07:39Z

@AlexanderSpirin Thanks for the report!
I think this is caused by #30268, and the actual CSV issue is also reported at #32134

So closing as a duplicate of #32134

jorisvandenbossche closed this as completed Mar 10, 2020

jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Mar 10, 2020

jorisvandenbossche added this to the No action milestone Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nullable integer data type bug in read_csv #32453

Nullable integer data type bug in read_csv #32453

AlexanderSpirin commented Mar 5, 2020

INSTALLED VERSIONS

jorisvandenbossche commented Mar 10, 2020

Nullable integer data type bug in read_csv #32453

Nullable integer data type bug in read_csv #32453

Comments

AlexanderSpirin commented Mar 5, 2020

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Mar 10, 2020

Output of `pd.show_versions()`