-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summing up empty data #1001
Comments
@Poshi this absolutely sounds like a miss, perhaps in the Go port. I absolutely treated empties as zeroes, documented that, regression-tested for it. Let me see what I missed in the port. |
@Poshi I stand corrected. Miller 5 -> 6 (C -> Go) didn't change behavior here. What I was recalling above as being "absolutely" true was the behavior for
We really should have similar semantics between This would be a slightly backward-incompatible change although (for strict semantic versioning) I would hesitate to call this Miller 7.0.0 -- we can put this in Miller 6.3.0. |
Note
|
Design: https://miller.readthedocs.io/en/latest/reference-main-null-data/#rules-for-null-handling In that wording (which goes back a long time) I clearly intended empty plus number is number. So the code (C and Go) and the docs are consistent. The only thing that's inconsistent (and very much so) is how this compares columnwise summations like Workaround:
|
@johnkerl what can I read to understand the meaning of Thank you |
I disagree. In that page, you literally say:
"an empty x should make the sum non-numeric"... is just opposite of your words in this message "empty plus number is number". The point is that it make sense that an empty value should affect operations and an absent not. But in homogeneous CSV files, you cannot distinguish empty from absent :-( |
Thanks @Poshi ! You're right, you found another inconsistency. To be clear though -- I very much think this change needs to be made; I'm in agreement. Thanks for helping me correctly catalog various things written in various places over the years -- helping to find the doc bits (as well as the code, obviously) that all need to be changed to make this happen. |
@aborruso |
I found a small lack of documentation and an unexpected result while adding data.
I'm using an input in homogeneous CSV format, so all data is either empty or present and defined, but never absent. It could be something like:
I'm trying to perform the addition of these three fields, expecting
mlr
to interpret the empty fields as zeroes, like if they were absent, to minimize keystrokes (following the policy of simplifying the life to the user with non-present data). These are the results:Only when all the data is non_empty the addition is performed. From the documentation I can see that this is expected. The reasoning behind is that conceptually, absent and empty are different things and have different meanings, but when you are in a homogeneous CSV context where absents are not possible, we hit this issue. In my specific case, the columns are file sizes, and the empty ones are for files that does not exist (so they are absent), but the only possible representation is be leaving the blank in the file.
Well, until this point it is just food for though (decisions and implementation has already been done so...).
But I also identified a point in the documentation that can be easily expanded and will help clarify this point:
I saw that in
reference-main-null-data
help page, the table at the end (arithmetic rules) does not contain an entry forempty
data, only for absents and errors. If the behavior must be different from absent, it should be included here too.The text was updated successfully, but these errors were encountered: