Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float no longer supports Inf #2412

Closed
greschd opened this issue Jan 22, 2019 · 24 comments
Closed

Float no longer supports Inf #2412

greschd opened this issue Jan 22, 2019 · 24 comments
Assignees
Labels
type/wontfix apply only to closed issues
Milestone

Comments

@greschd
Copy link
Member

greschd commented Jan 22, 2019

(decided to open a separate issue instead of discussing on the merged PR)

In #2129, an error is raised when a Float is either nan or inf. The reasoning is that the numeric type of postgres doesn't support these special values. However, as far as I can tell Float is stored as a postgres float, not numeric. I have used Inf in particular extensively without any issues in previous versions.

Is there an error which led to the change in #2129, or was that just cautionary?

@giovannipizzi
Copy link
Member

Good question - probably it was just cautionary? @dev-zero do you remember the discussion about this?
Maybe it was also related to hashing?

@greschd
Copy link
Member Author

greschd commented Jan 25, 2019

For what it's worth, hashing still seems to work in my temporary branch where I removed these checks.

@dev-zero
Copy link
Contributor

dev-zero commented Jan 25, 2019 via email

@greschd
Copy link
Member Author

greschd commented Jan 25, 2019 via email

@sphuber sphuber added this to the v1.0.0 milestone Mar 5, 2019
@sphuber sphuber added the priority/critical-blocking must be resolved before next release label Apr 3, 2019
@sphuber
Copy link
Contributor

sphuber commented Apr 12, 2019

Since this will not be a breaking change as at worst more will be allowed, this is postponed to after v1.0.0

@sphuber sphuber modified the milestones: v1.0.0, v1.1.0 Apr 12, 2019
@CasperWA
Copy link
Contributor

I have recently had a problem with this, trying to import StructureData with NaN values.

Looking around, I found a partial solution for the -Inf and Inf values, in the form of using the values -2e308 and 2e308, respectively.
All credit to the ROOT issue here and the related discussion here.
The numbers are legal JSON values, while they are also automatically parsed as -inf and inf in Python.

However, this still leaves an easy fix for NaN.

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

Looking at this again, I don't completely understand what the issue is with JSON. Does Python maybe add some magic? The following works:

In [1]: import json 
In [2]: json.loads(json.dumps(float('inf')))    
Out[2]: inf

In [3]: json.loads(json.dumps(float('nan'))) 
Out[3]: nan

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

Ah indeed, the python json module has an allow_nan flag (default: True). By default, it doesn't strictly comply with the JSON standard, and instead adds support for inf and nan: https://docs.python.org/3.5/library/json.html#json.dump

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

So the question is this: Do we need to be strictly standard-compliant, or is the Python implementation the only one that will ever see our JSONs?

@sphuber
Copy link
Contributor

sphuber commented Oct 11, 2019

We don't just rely on being able to dump to file though. Both database backends now use JSONB fields for the attributes and extras. I don't know for sure but maybe Postgres' implementation respects the JSON spec strictly. Try to store and attribute with float('inf') as value

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

Ah, indeed, that was the reason.

So I guess that leads us back to @dev-zero's suggestion above to split implementations of clean_value between values that will be stored as JSON and those that won't (like for Float).

For #3415 this means we probably won't allow inf and nan in that context, right?

@sphuber
Copy link
Contributor

sphuber commented Oct 11, 2019

even the value of Float will be stored in the attributes column and therefore in the JSONB field

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

Ah, I see.. this is a more recent change, right? As far as I can remember, when we initially discussed this I could "fix" Float by just un-commenting the check in clean_value.

What's the rationale for not storing it as a float field?

@sphuber
Copy link
Contributor

sphuber commented Oct 11, 2019

Because all data properties are stored as attributes. It used to be possible because in the old EAV schema (for Django) we did have float fields for float values. However, now the attributes are just stored in a single JSONB field.

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

Makes sense - that's probably also better for consistency across DB backends. Do you have any other ideas for how to solve this issue? Or should we close it because it's not likely to be implemented any time soon?

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

We could take inspiration from the python JSON implementation, and just auto-convert inf and nan to strings 'Infinity' and 'NaN' -- that could be done by modifying the value getter and setter in the Float class. Any thoughts on that?

@giovannipizzi
Copy link
Member

We can do conversion from those values to floats when storing if we don't care about comparison directly in the DB and type mismatch is not a problem. Probably, could be a solution with #3415 (note however that the query builder comparison would be the one for NaN always returning False, and not the one for infinity, i.e. for instance 3 < 'Infinity' is False.
For this reason, I am not sure this is the best approach in general.

Also, what I would be also in general against is to infer from a string if it used to be a float('inf') (at least in a general way in AiiDA - it's probably ok to do for a specific field, in a data subclass or similar, if you know what data values you expect, even if the problem with comparison remains) - we did it for dates (now dropped) and beside slowing down things, it is not very robust and would lead to unexpected things like you store the string 'Infinity' and when you get it back it's a float('inf').

@greschd
Copy link
Member Author

greschd commented Oct 12, 2019

I'm not sure I completely understand the first point -- is it that the QueryBuilder (any maybe other things) look directly at the attributes, and don't go through the .value property? In that case, I agree that it is problematic.

Regarding the second point, to be clear: This suggestion would be for the Float class only. Since there we know the value attribute should be a float, it would be possible to do this conversion. For generic attributes it would indeed be a bad idea - precisely because you can't know if it's supposed to be the string 'Infinity' or a float that was converted.

@greschd
Copy link
Member Author

greschd commented Oct 12, 2019

This also wouldn't solve #3415, since there it's an attribute of a different class.

@giovannipizzi
Copy link
Member

is it that the QueryBuilder (any maybe other things) look directly at the attributes, and don't go through the .value property?

Correct, this is for efficiency reasons (delegating the query to the DB and not fetching all data first in python to run .value and then filtering.

Regarding the second point, to be clear: This suggestion would be for the Float class only.

I see - for the Float class only one might think to do it, but it would make the behavior very different from when putting a float as a value in a list or dict, so I'm not sure it's better to support these values only for single Floats

@greschd
Copy link
Member Author

greschd commented Oct 12, 2019

Correct, this is for efficiency reasons (delegating the query to the DB and not fetching all data first in python to run .value and then filtering.

I think this makes it a bad idea to do some magic in the vaue property. After all, the unprocessed value can leak anyway, which could lead to subtle bugs / inconsistencies.

Unless there are other suggestions, I'd vote to close this issue as "wontfix".

@sphuber sphuber added type/wontfix apply only to closed issues and removed priority/critical-blocking must be resolved before next release requires discussion type/bug labels Oct 16, 2019
@sphuber
Copy link
Contributor

sphuber commented Oct 16, 2019

I agree, so will close it for now

@sphuber sphuber closed this as completed Oct 16, 2019
@greschd
Copy link
Member Author

greschd commented Oct 22, 2019

I'm reopening this because there is still a remaining issue: Since previous versions of AiiDA supported NaN / Inf, they can be in existing exports -- these break when trying to import on a new AiiDA instance. I'm also not sure if a migration path exists.

@greschd greschd reopened this Oct 22, 2019
@greschd greschd changed the title Float no longer supports Inf Migration for Inf / NaN float values. Oct 22, 2019
@greschd
Copy link
Member Author

greschd commented Oct 22, 2019

Could also be a separate issue, but I think the discussion here is valuable.

Yeah no, I'll make a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/wontfix apply only to closed issues
Projects
None yet
Development

No branches or pull requests

5 participants