`read.xlsx(detectDates = TRUE)` failing #288

jmbarbone · 2021-11-07T02:44:28Z

Reporting again here as we still have this error

openxlsx::read.xlsx("C:/Users/jmbar/Downloads/bad.xlsx", detectDates = TRUE)
#> Error in read_workbook(cols_in = cell_cols, rows_in = cell_rows, v = v, : basic_string::substr: __pos (which is 8) > this->size() (which is 7)

^{Created on 2021-11-06 by the reprex package (v2.0.1)}

Original

I can confirm the issue is still there. See the file attached, opening it with detectDates=TRUE raises an error:
Error reading date:
44489.4
row: 1
col: 1
Error in read_workbook(cols_in = cell_cols, rows_in = cell_rows, v = v,  : 
  basic_string::substr: __pos (which is 8) > this->size() (which is 7)
bad.xlsx

Originally posted by @aushev in awalker89/openxlsx#249 (comment)

The text was updated successfully, but these errors were encountered:

Improve detectDates. closes #288

ProfFancyPants · 2022-02-28T19:42:36Z

I have an additional file doing the exact same thing. Would you like me to upload a minimum example for experimental purposes?

JanMarvin · 2022-02-28T20:07:26Z

Hi @ProfFancyPants , I assume that we understand fairly well what is going on, the question is more or less, why does it happen. I have pushed a fix to the development branch. Please see if this fixes your issue. Though I assume it is only partially right, it should fix the issue, but it is a hack - solving and hiding a problem that should be fixed somewhere else.

ProfFancyPants · 2022-03-01T04:18:42Z

I checked it with the development branch and the issue still remains exactly as before. I was able to get the reader to do some additional interesting things when I deleted choice cells in the date column where it was loading but actually removing values in other columns. If the bulk of this issue isn't in Apache I could help you take a look. What is so bafilling is that doing a complete copy value paste stops it completely, even with restoring all the previous formatting. My assumption was that there is a hidden or exotic character that looks exactly like the normal character but gets coerced back when value pasted.

JanMarvin · 2022-03-01T07:15:14Z

If the bulk of this issue isn't in Apache I could help you take a look.

I don't understand what Apache has to do with this issue. When I looked into the issue I've attempted to fix, we expected a string like "2022-03-01", but somehow still had a 7 character wide numeric like "11111.1". Therefore when looking for the part "-01" we fail and the error is thrown. Substring beginning at 8 requested, but only 7 characters provided.
My fix checked for the numeric and initiated a conversion from numeric to date. After this the string is long enough. Therefore, I assume this has already been fixed. Unknown to me is why we ended in this situation in the first place. The symptoms can be treated, but they are not the root of the evil.

If you want to look into this, you're ofc welcome :)

ProfFancyPants · 2022-03-01T18:45:10Z

I think the issue might be deeper and is related to how excel is saving the .XML files internally. In my test file, two end-user identical sheets down to the simplest reproducible example. By "end-user identical" I mean identical as far as it concerns someone using excel and using everything physically to make the two sheets identical. Divergence in the .XML files is between column and row style.

Exactly end-user identical sheets are being saved as different styles in XML. Once "_openxlsx_loadworksheets" reads in styleObjects(styleObjectsSEXP) and xmlFiles(xmlFilesSEXP) any manner of things could be happening, and it doesn't mean that loadworksheets is making the wrong choices based on what the row's "s=X" and "<v>X</v>" is telling it to do. Also, I haven't quite decerned where all the styleObjects are coming from because some aren't in the workbook anymore.

The worst part is that formatting order seems to matter. If I format the date field to one thing and change it back the .XML files aren't identical.

JanMarvin · 2022-03-01T19:05:03Z

Sorry, but I don't really get your point. The styles are from styles.xml, some in the <xf.../> and some custom formats are in <numfmt ...>. openxlsx creates styleObjects when loading and saving. Therefore our style IDs must not match those of Excel. The question to solve in this issue is: why is openxlsx currently identifying some cells as dates and why aren't they prepared correctly. Somewhere in the loading process we skip a date creation step. In Excel they are numerics with styles, we convert them from numerics to date/POSIX strings and try to use some substrings for another round of date creation. The entire process is a bit dubious and cannot really say why we're doing it in this specific way.

However, behind the scenes we're working hard on a successor to openxlsx and this problem does not exist in the new code. Therefore my time and interest to solve this one here is currently a bit limited. After all it's not affecting many people.

deschen1 · 2022-07-06T08:37:52Z

Just tried to read in the bad.xlsx file posted in the initial post with the potential fix in the development branch. Unfortunately, trying to open the file crashs/terminates R.

sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8    LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                    LC_TIME=German_Germany.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] openxlsx_4.2.5.1 testthat_3.1.4  

loaded via a namespace (and not attached):
 [1] zip_2.2.0         Rcpp_1.0.8.3      compiler_4.2.1    pillar_1.7.0      prettyunits_1.1.1 remotes_2.4.2     tools_4.2.1       digest_0.6.29     pkgbuild_1.3.1    pkgload_1.3.0     memoise_2.0.1     lifecycle_1.0.1   tibble_3.1.7      pkgconfig_2.0.3   rlang_1.0.3       cli_3.3.0        
[17] rstudioapi_0.13   commonmark_1.8.0  xfun_0.31         fastmap_1.1.0     xml2_1.3.3        knitr_1.39        stringr_1.4.0     roxygen2_7.2.0    withr_2.5.0       desc_1.4.1        fs_1.5.2          vctrs_0.4.1       devtools_2.4.3    rprojroot_2.0.3   glue_1.6.2        R6_2.5.1         
[33] processx_3.6.1    fansi_1.0.3       sessioninfo_1.2.2 callr_3.7.0       purrr_0.3.4       magrittr_2.0.3    ps_1.7.1          codetools_0.2-18  ellipsis_0.3.2    usethis_2.1.6     utf8_1.2.2        stringi_1.7.6     cachem_1.0.6      crayon_1.5.1      brio_1.1.3

cha-petersumm · 2023-03-30T21:44:04Z

I have the same issue, with a file that was simply written by openxlsx and then read back again (so Excel has never been near it).

write.xlsx(CVAD_list, "CVAD list.xlsx")
CVAD_list2 <- read.xlsx("CVAD list.xlsx")
CVAD_list3 <- read.xlsx("CVAD list.xlsx", detectDates = T)

Error message from the third line above is as follows:

Error reading date:
44696.5
row: 59
col: 6
Error: basic_string::substr: __pos (which is 8) > this->size() (which is 7)

There's nothing obviously odd about cell F59.

I'm happy to create an example that I can post if it would be useful.

packageVersion("openxlsx")
[1] ‘4.2.5.2’

github-actions · 2024-03-30T01:53:22Z

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

CarolusKwok · 2024-04-05T17:26:32Z

This is not stale, I got the same error just now. Is there a fix? thanks :D

# openxlsx 4.2.7.1 * It's now possible to insert a hyperlinked image by passing a URL, relative or absolute file path, or mailto string to the new `address` parameter of `insertImage()`. # openxlsx 4.2.7 * Fixed warning on `dataValidation(..., type = "list")` ([#342](ycphs/openxlsx#342)) * Added optional argument to `loadWorkbook` to decide if empty/blank cells should be converted to NA_character_ (the default) or left blank as is * `saveWorkbook()` now succeeds when called after the user has set column widths for a range of columns (e.g. 1:2), saved the workbook, then set column widths for a new range that is inclusive of the previous one (e.g. 1:5) ([#493](ycphs/openxlsx#493)). ## Improvements * Improve detectDates ([#288](ycphs/openxlsx#288)) * Preserve window size and position, also `getWindowSize()` and `setWindowSize()` ([466](ycphs/openxlsx#466)) # openxlsx 4.2.6 * Fix external links ([#410](ycphs/openxlsx#410)) * Do not add unneccessary sheetPr node ([#409](ycphs/openxlsx#409)) * Add support for `namedRegion`s having dots and other special characters ([#338](ycphs/openxlsx#338)). * Add type blanks and not blanks to conditional formatting ([#311](ycphs/openxlsx#311)) # openxlsx 4.2.5 ## Fixes * `openxlsx_setOp()` now works with named list ([#215](ycphs/openxlsx#215)) * `loadWorkbook()` imports `inlineStr`. Values remain `inlineStr` when writing the workbook with `saveWorkbook()`. Similar `read.xlsx` and `readWorkbook` import `inlineStr`. * `read.xlsx()` no longer changes random seed ([#183](ycphs/openxlsx#183)) * fixed a regression that caused fonts to be read in incorrectly ([#207](ycphs/openxlsx#207)) * add option to save as read only recommended ([#201](ycphs/openxlsx#201)) * fixed writing hyperlink formulas ([#200](ycphs/openxlsx#200)) * `write.xlsx()` now throws an error if it doesn't have write permissions ([#190](ycphs/openxlsx#190)) * `write.xlsx()` now again uses the default of `overwrite = TRUE` for saving files ([#249](ycphs/openxlsx#249)) * `as.character.formula()` exported to warn about potential conflicts with other packages ([#312](ycphs/openxlsx#312), [#315](ycphs/openxlsx#315)) ## Improvements * `options()` are more consistently set in functions (see: [#289](ycphs/openxlsx#262)) * `Workbook$show()` no longer fails when called in a 0 sheet workbook([#240](ycphs/openxlsx#240)) * `read.xlsx()` again accepts `.xlsm` files ([#205](ycphs/openxlsx#205), [#209](ycphs/openxlsx#209)) * `makeHyperlinkString()` does no longer require a sheet argument ([#57](ycphs/openxlsx#57), [#58](ycphs/openxlsx#58)) * improvements in how `openxlsx` creates temporary directories (see [#262](ycphs/openxlsx#262)) * `writeData()` calls `force(x)` to evaluate the object before options are set ([#264](ycphs/openxlsx#264)) * `createComment()` now correctly handles `integers` in `width` and `height` ([#275](ycphs/openxlsx#275)) * `setStyles()` accepts `halign="justify"` ([#305](ycphs/openxlsx#305))

jmbarbone changed the title ~~read_workbook(detectDates = TRUE) failing~~ read.xlsx(detectDates = TRUE) failing Nov 7, 2021

JanMarvin added a commit to JanMarvin/openxlsx that referenced this issue Nov 8, 2021

fix ycphs#288

9b1011b

JanMarvin mentioned this issue Nov 8, 2021

Improve detectDates. closes #288 #291

Merged

JanMarvin added a commit to JanMarvin/openxlsx that referenced this issue Nov 8, 2021

fix ycphs#288

45d7fdc

JanMarvin added a commit to JanMarvin/openxlsx that referenced this issue Dec 15, 2021

fix ycphs#288

ff213c5

JanMarvin added a commit that referenced this issue Jan 3, 2022

Merge pull request #291 from JanMarvin/gh_issue_288

0bde8cf

Improve detectDates. closes #288

JanMarvin referenced this issue in JanMarvin/openxlsx2 Feb 28, 2022

[R/wb_to_df] enable reading POSIXct dates and tests

38854bf

github-actions bot added the Stale label Mar 30, 2024

github-actions bot removed the Stale label Apr 6, 2024

JanMarvin closed this as completed in ece7978 Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`read.xlsx(detectDates = TRUE)` failing #288

`read.xlsx(detectDates = TRUE)` failing #288

jmbarbone commented Nov 7, 2021

ProfFancyPants commented Feb 28, 2022

JanMarvin commented Feb 28, 2022

ProfFancyPants commented Mar 1, 2022

JanMarvin commented Mar 1, 2022

ProfFancyPants commented Mar 1, 2022 •

edited

Loading

JanMarvin commented Mar 1, 2022

deschen1 commented Jul 6, 2022

cha-petersumm commented Mar 30, 2023

github-actions bot commented Mar 30, 2024

CarolusKwok commented Apr 5, 2024

read.xlsx(detectDates = TRUE) failing #288

read.xlsx(detectDates = TRUE) failing #288

Comments

jmbarbone commented Nov 7, 2021

ProfFancyPants commented Feb 28, 2022

JanMarvin commented Feb 28, 2022

ProfFancyPants commented Mar 1, 2022

JanMarvin commented Mar 1, 2022

ProfFancyPants commented Mar 1, 2022 • edited Loading

JanMarvin commented Mar 1, 2022

deschen1 commented Jul 6, 2022

cha-petersumm commented Mar 30, 2023

github-actions bot commented Mar 30, 2024

CarolusKwok commented Apr 5, 2024

`read.xlsx(detectDates = TRUE)` failing #288

`read.xlsx(detectDates = TRUE)` failing #288

ProfFancyPants commented Mar 1, 2022 •

edited

Loading