Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hex floating-point numbers no longer parse in 1.10.5 #2316

Closed
scottstanfield opened this issue Aug 21, 2017 · 3 comments
Closed

Hex floating-point numbers no longer parse in 1.10.5 #2316

scottstanfield opened this issue Aug 21, 2017 · 3 comments

Comments

@scottstanfield
Copy link

We know data.table can read floating point numbers in a CSV file just fine. What surprised me is that data.table 1.10.4 could read the Java specific hexadecimal floating point numbers.

If you're not familiar with them, they look like 0x1.21d6353p-1 which is roughly 0.56608. The JDK section 3.10.2 has more details.

Data.table 1.10.4 reads them just fine. But 1.10.5 reads them as a string.

My source CSV file is a simple POJO file from H2O.ai. Three columns: the prediction target (true/false), and the probability of either result (columns FALSE and TRUE). Sample file attached.

Steps to reproduce are trivial, as you can see in the attached screenshots. I'm expecting 1.10.5 to produce the same results as 1.10.4. I don't have a workaround yet.

data.table 1.10.5 (current top-of-tree as of 2017-08-21)

image

data.table 1.10.4

image

The CSV file (had to upload it as a .TXT file)
pojo.txt

@scottstanfield
Copy link
Author

As I'm poking around the source to see when this regressed, I noticed a comment in foverlap.r

# Conclusion: floating point manipulations are hell!

@scottstanfield
Copy link
Author

My work-around in 1.10.5. Since my column names are also reserved-words in R (yeah, I know), I use column positions. Always happy to play code-golf and find a shorter or faster way to do this.

# Works in 1.10.5 (and in 1.10.4, but not necessary)

cols <- c(2, 3)  # My FALSE and TRUE columns number 2 and 3
d[, (cols) := lapply(.SD, as.numeric), .SDcols=cols]

@st-pasha
Copy link
Contributor

The workaround is no longer needed: fread is now able to parse these numbers directly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants