Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writeData converts character column to numeric value #264

Closed
chicotobi opened this issue Sep 29, 2021 · 12 comments
Closed

writeData converts character column to numeric value #264

chicotobi opened this issue Sep 29, 2021 · 12 comments

Comments

@chicotobi
Copy link

Describe the bug

Function "base::format" is used to convert a numeric value 1234.1 to a character "1234.1" in a data.frame using dplyr , writeData writes the string "1234.0999999999999". I don't fully understand what's happening. Maybe this is a problem of lazy evaluation.

The bug doesn't appear, if I use base::paste.

The bug doesn't appear, if the data.frame is bound to a variable and then written

To Reproduce

library(openxlsx)
library(magrittr)
library(dplyr)
library(XML)

fileName <- "test.xlsx"
wb <- createWorkbook()
sheet <- addWorksheet(wb,"Sheet")
data.frame(a=1234.1) %>% mutate(a=format(a)) %>% writeData(wb,sheet,.)
saveWorkbook(wb,fileName,overwrite=T)
unzip(fileName,exdir="tmp")
XML::xmlParse("tmp/xl/sharedStrings.xml")

Output

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="2" uniqueCount="2">
  <si>
    <t xml:space="preserve">a</t>
  </si>
  <si>
    <t xml:space="preserve">1234.0999999999999</t>
  </si>
</sst>

Expected behavior
String "1234.1" instead of a weird casted floating point number.

Additional context

This works:

library(openxlsx)
library(magrittr)
library(dplyr)
library(XML)

fileName <- "test.xlsx"
wb <- createWorkbook()
sheet <- addWorksheet(wb,"Sheet")
data.frame(a=1234.1) %>% mutate(a=paste(a)) %>% writeData(wb,sheet,.)
saveWorkbook(wb,fileName,overwrite=T)
unzip(fileName,exdir="tmp")
XML::xmlParse("tmp/xl/sharedStrings.xml")

This also works:

library(openxlsx)
library(magrittr)
library(dplyr)
library(XML)

fileName <- "test.xlsx"
wb <- createWorkbook()
sheet <- addWorksheet(wb,"Sheet")
df <- data.frame(a=1234.1) %>% mutate(a=format(a))
df %>% writeData(wb,sheet,.)
saveWorkbook(wb,fileName,overwrite=T)
unzip(fileName,exdir="tmp")
XML::xmlParse("tmp/xl/sharedStrings.xml")
@jmbarbone
Copy link
Contributor

Simpler reprex.

Doesn't seem to be anything with dplyr. Looks like write.xlsx() gets this right but writeData() or

library(openxlsx)

wbfile <- tempfile()
zipfile <- tempfile()
write.xlsx(list(Sheet = data.frame(a = format(1234.1))), wbfile)
unzip(wbfile, exdir = zipfile)
XML::xmlParse(file.path(zipfile, "xl/sharedStrings.xml"))
#> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
#> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="2" uniqueCount="2">
#>   <si>
#>     <t xml:space="preserve">a</t>
#>   </si>
#>   <si>
#>     <t xml:space="preserve">1234.1</t>
#>   </si>
#> </sst>
#> 

Created on 2021-09-29 by the reprex package (v2.0.1)

library(openxlsx)

wbfile <- tempfile()
zipfile <- tempfile()
wb <- createWorkbook()
addWorksheet(wb, "Sheet")
writeData(wb, "Sheet", data.frame(a = format(1234.1)))
saveWorkbook(wb, wbfile)
unzip(wbfile, exdir = zipfile)
XML::xmlParse(file.path(zipfile, "xl/sharedStrings.xml"))
#> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
#> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="2" uniqueCount="2">
#>   <si>
#>     <t xml:space="preserve">a</t>
#>   </si>
#>   <si>
#>     <t xml:space="preserve">1234.0999999999999</t>
#>   </si>
#> </sst>
#> 

Created on 2021-09-29 by the reprex package (v2.0.1)

@JanMarvin
Copy link
Collaborator

Looks like its evaluated with digits = 17? Is this caused by us or simply an overflow of R's default precision or 16 digits? Nothing in writeData looks like its modifying anything, therefore I tend to agree that its not our bug.

> data.frame(a = format(1234.1, digits = 17))
                   a
1 1234.0999999999999

@chicotobi
Copy link
Author

chicotobi commented Sep 30, 2021

JanMarvin, I agree, maybe it's a bug somewhere else. But to me the objects look identical. Can you give me a way different than utils::str to see the difference? This is the output for my three provided cases (first case gives the error, second and third work with writeData).

> data.frame(a=format(1234.1)) %>% str
'data.frame':	1 obs. of  1 variable:
 $ a: chr "1234.1"
> data.frame(a=paste(1234.1)) %>% str
'data.frame':	1 obs. of  1 variable:
 $ a: chr "1234.1"
> df <- data.frame(a=format(1234.1))
> df %>% str
'data.frame':	1 obs. of  1 variable:
 $ a: chr "1234.1"

I see this output as: base::format converts the number to a string with six characters, namely '1' '2' '3' '4' '.' and '1'. There is no number at this point. It could be 'foobar' and it should be the same for R.

> data.frame(a='foobar') %>% str
'data.frame':	1 obs. of  1 variable:
 $ a: chr "foobar"

I don't know if the expression in base::format is somehow a "lazy evaluation r-value" where the floating point value still exists somewhere.

Edit: Obviously it works with foobar:

library(openxlsx)

wbfile <- tempfile()
zipfile <- tempfile()
wb <- createWorkbook()
addWorksheet(wb, "Sheet")
writeData(wb, "Sheet", data.frame(a ="foobar"))
saveWorkbook(wb, wbfile)
unzip(wbfile, exdir = zipfile)
XML::xmlParse(file.path(zipfile, "xl/sharedStrings.xml"))

gives

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="2" uniqueCount="2">
  <si>
    <t xml:space="preserve">a</t>
  </si>
  <si>
    <t xml:space="preserve">foobar</t>
  </si>
</sst>

@chicotobi
Copy link
Author

chicotobi commented Sep 30, 2021

Solution:

str forces evaluation in current environment, that's why they look identical. The objects are not identical. One is a "real numbers and strings"-object and the other is still a dangling lazily-to-be-evaluated expression with default arguments for digits.

I wrote about the rest on my StackOverflow-Post.

https://stackoverflow.com/questions/69378224/why-do-these-commands-give-different-results

One suggestion for you guys: Can you force evaluation of the data.frame argument ("casting it to bare-metal numbers and strings") within the top-level environment before accepting it as an argument? I don't know if R allows that, or if the outer environment is literally "out-of-scope".

Can be closed, and I will indeed close it.

@JanMarvin
Copy link
Collaborator

The MWE by @jmbarbone gave me a similar impression. Embedding things in the tidyverse made it harder to spot. We should investigate where this digits option comes from (it's obviously differing from the default global environment). Especially now that R brings it's own pipe and people might use this more. PR are welcome.
For the time being there are other ways to convert numbers to characters e.g. as.character.

@JanMarvin
Copy link
Collaborator

FYI It's a side effect from here:

openxlsx/R/utils.R

Lines 41 to 50 in 0694c7a

get_set_options <- function() {
op <- options()
options(
# increase scipen to avoid writing in scientific
scipen = 200,
OutDec = ".",
digits = 22
)
op
}

@JanMarvin JanMarvin reopened this Sep 30, 2021
@JanMarvin
Copy link
Collaborator

Something like eval(x, envir = globalenv()) prior to entering our environment would be a quick and dirty solution, although it's kind of a hack. If someone has a better solution ...

@chicotobi
Copy link
Author

chicotobi commented Sep 30, 2021

Something like eval(x, envir = globalenv()) prior to entering our environment would be a quick and dirty solution, although it's kind of a hack. If someone has a better solution ...

I thought so too, but it doesn't work:

library(magrittr)
rlang::scoped_options(my_var="foo")
f <- function(x) {
  rlang::scoped_options(my_var="bar")
  cat(x$a)
}
data.frame(a=options()$my_var) %>% eval(envir=globalenv()) %>% f()

> bar

I mean, am I stupid or what is going on?

@JanMarvin
Copy link
Collaborator

JanMarvin commented Sep 30, 2021

I suggest to add the eval into the function

library(magrittr)
options(my_var="foo")
f <- function(x) {
  eval(x, envir=globalenv())
  rlang::scoped_options(my_var="bar")
  cat(x$a)
}
data.frame(a=options()$my_var) %>% f()

@chicotobi
Copy link
Author

chicotobi commented Sep 30, 2021

Ah yeah, that would be a solution within your code. So it works, because in my case, eval is still part of a piping-chain, but in your case, eval is really it's own expression and get's executed before the scoped options change.

Oh man, I really don't like where all this is going... you don't even need to call anything. Simply writing the symbol is enough to trigger the evaluation, and then x is fixed in memory.

library(magrittr)
options(my_var="foo")
f <- function(x) {
  x
  rlang::scoped_options(my_var="bar")
  cat(x$a)
}
data.frame(a=options()$my_var) %>% f()

@JanMarvin
Copy link
Collaborator

I'll have a look. I assume that calling x the way you did does actually do the same thing. In our code I fear that the environment might be impacted differently. It might be that the option setting function is called early on.
In addition, I prefer the eval part because at least for me it's easier to spot the impact. Though maybe parent environment might be the better call? For now I consider it a minor bug in our code, which was not written with a pipe approach in mind.

@jmbarbone
Copy link
Contributor

I may have found a simple solution by adding force(x) at the beginning of writeData(), before any options are set. Seems to do the trick. I'll submit a PR soon.

jmbarbone added a commit to jmbarbone/openxlsx that referenced this issue Sep 30, 2021
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jan 4, 2025
# openxlsx 4.2.7.1

* It's now possible to insert a hyperlinked image by passing a URL,
  relative or absolute file path, or mailto string to the new
  `address` parameter of `insertImage()`.

# openxlsx 4.2.7

* Fixed warning on `dataValidation(..., type = "list")`
  ([#342](ycphs/openxlsx#342))

* Added optional argument to `loadWorkbook` to decide if empty/blank
  cells should be converted to NA_character_ (the default) or left
  blank as is

* `saveWorkbook()` now succeeds when called after the user has set
  column widths for a range of columns (e.g. 1:2), saved the workbook,
  then set column widths for a new range that is inclusive of the
  previous one (e.g. 1:5)
  ([#493](ycphs/openxlsx#493)).

## Improvements

* Improve detectDates
  ([#288](ycphs/openxlsx#288))

* Preserve window size and position, also `getWindowSize()` and
  `setWindowSize()`
  ([466](ycphs/openxlsx#466))

# openxlsx 4.2.6

* Fix external links
  ([#410](ycphs/openxlsx#410))

* Do not add unneccessary sheetPr node
  ([#409](ycphs/openxlsx#409))

* Add support for `namedRegion`s having dots and other special
  characters ([#338](ycphs/openxlsx#338)).

* Add type blanks and not blanks to conditional formatting
  ([#311](ycphs/openxlsx#311))

# openxlsx 4.2.5

## Fixes

* `openxlsx_setOp()` now works with named list
  ([#215](ycphs/openxlsx#215))

* `loadWorkbook()` imports `inlineStr`. Values remain `inlineStr` when
  writing the workbook with `saveWorkbook()`. Similar `read.xlsx` and
  `readWorkbook` import `inlineStr`.

* `read.xlsx()` no longer changes random seed
  ([#183](ycphs/openxlsx#183))

* fixed a regression that caused fonts to be read in incorrectly
  ([#207](ycphs/openxlsx#207))

* add option to save as read only recommended
  ([#201](ycphs/openxlsx#201))

* fixed writing hyperlink formulas
  ([#200](ycphs/openxlsx#200))

* `write.xlsx()` now throws an error if it doesn't have write
  permissions ([#190](ycphs/openxlsx#190))

* `write.xlsx()` now again uses the default of `overwrite = TRUE` for
  saving files ([#249](ycphs/openxlsx#249))

* `as.character.formula()` exported to warn about potential conflicts
  with other packages
  ([#312](ycphs/openxlsx#312),
  [#315](ycphs/openxlsx#315))

## Improvements

* `options()` are more consistently set in functions (see:
  [#289](ycphs/openxlsx#262))

* `Workbook$show()` no longer fails when called in a 0 sheet
  workbook([#240](ycphs/openxlsx#240))

* `read.xlsx()` again accepts `.xlsm` files
([#205](ycphs/openxlsx#205),
[#209](ycphs/openxlsx#209))

* `makeHyperlinkString()` does no longer require a sheet argument
  ([#57](ycphs/openxlsx#57),
  [#58](ycphs/openxlsx#58))

* improvements in how `openxlsx` creates temporary directories (see
  [#262](ycphs/openxlsx#262))

* `writeData()` calls `force(x)` to evaluate the object before options
  are set ([#264](ycphs/openxlsx#264))

* `createComment()` now correctly handles `integers` in `width` and
  `height` ([#275](ycphs/openxlsx#275))

* `setStyles()` accepts `halign="justify"`
  ([#305](ycphs/openxlsx#305))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants