Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread fails when warning is caught: "Previous fread() session was not cleaned up properly. Cleaned up ok at the beginning of this fread() call" #2904

Open
slazicoicr opened this issue May 25, 2018 · 8 comments

Comments

@slazicoicr
Copy link

slazicoicr commented May 25, 2018

The three lines below work as expected:

fread("will, work\njust, fine\nthank, you", header = FALSE, sep=",", sep2=",")
fread("fails, rather\nbadly, too\nbad", header = FALSE, sep=",", sep2=",")
fread("will, work\njust, fine\nthank, you", header = FALSE, sep=",", sep2=",")

The first and third line read just fine and the second line throws a warning.

The issue happens when the line that throws the warning is wrapped up in a tryCatch block

tryCatch({
  fread("fails, rather\nbadly, too\nbad", header = FALSE, sep=",", sep2=",")
}, warning = function(w) {
  conditionMessage(w)
})

fread("will, work\njust, fine\nthank, you", header = FALSE, sep=",", sep2=",")

Calling the last fread throws a warning message, even though it should work just fine:

Warning message:
In fread("will, work\njust, fine\nthank, you", header = FALSE, sep = ",",  :
  Previous fread() session was not cleaned up properly. Cleaned up ok at the beginning of this fread() call.

sessionInfo:

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4  
@mattdowle
Copy link
Member

Thanks for reporting.
options(warn=2) was anticipated and does not generate this warning. But trapping warning() via tryCatch() wasn't anticipated unfortunately. Something 'unknown' was anticipated, though, and that's the coping mechanism you're seeing: it cleans itself up upon next call and issues a good warning. It could have been a lot worse.
I don't know of a way to know at R level whether tryCatch(..., warning=) has been specified as something that halts, or not. Any attempt at that is likely to messy and fragile.

One solution might be for DTWARN in fread.c to cache the warning(s) in a private buffer and then call R/Python's warning() on exit after freadCleanup().

@mattdowle mattdowle added this to the 1.11.6 milestone May 30, 2018
@st-pasha
Copy link
Contributor

what if we surround the entire R fread(...) call into one big tryCatch(..., finally=freadCleanup) call?

@mattdowle
Copy link
Member

That's a neat idea!
Would be adding tryCatch(..., finally=.Call(CfreadCleanup)) around the .Call(CfreadR) here I guess : https://github.com/Rdatatable/data.table/blob/master/R/fread.R#L101

@jangorecki jangorecki modified the milestones: 1.12.0, 1.11.6 Jun 6, 2018
@mattdowle mattdowle modified the milestones: 1.11.6, 1.12.0 Sep 20, 2018
@mattdowle mattdowle modified the milestones: 1.12.0, 1.12.2 Jan 11, 2019
@dhersz
Copy link

dhersz commented Nov 10, 2020

Hello. I'm running into this same problem when trying to catch the "parsing-problems-related" warning messages in a custom file-reading function I'm creating. I've seen other packages using readr::problems to do this, but I'm trying to stick to data.table on this.

Do you have any suggestions on how to do this while staying away from this warning? I'll post the issue I created on my own repo below to exemplify the error I'm facing.

You can see that this line: gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") is what causes the warning to be thrown in the first place. When I try to use the same command again, it throws an unzip-related error. However, if I specify the second argument as stop_times (which means I'm only unzipping this file and overwriting the others [such as stops, that causes the problem]) I can subsequently run gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip"), apparently because of the cleanup.

I actually have an on.exit() call to unlink the temporary directory gtfsdir created, but when I have parsing failures (i.e. when I catch the warnings) the directory is not removed, even with force = TRUE. I have also tried to manually remove the directory, but I get the message that the file is being used by another program thus the directory cannot be removed.

ps: "não foi possível abrir o arquivo" means "could not open the file"


Also, another problem is that fread seems not to do well with tryCatch. Check this:

gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Warning message:
#> In read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") :
#>   Parsing failures while reading the following file(s): trips, stops
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Error in utils::unzip(path, files = files_to_read, exdir = temp_dir, overwrite = TRUE) : 
#>   não foi possível abrir o arquivo 'C:/Users/Usuario/AppData/Local/Temp/RtmpqS6Xww/gtfsdir/stops.txt': Invalid argument
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip", "stop_times")
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Warning message:
#> In read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") :
#>   Parsing failures while reading the following file(s): trips, stops

I actually suppress a possible warning when reading the gtfs due to how the function is structured. If I remove these warning supression you can see that fread does some cleaning after the "invalid argument" error.

gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Error in utils::unzip(path, files = files_to_read, exdir = temp_dir, overwrite = TRUE) : 
#>   não foi possível abrir o arquivo 'C:/Users/Usuario/AppData/Local/Temp/RtmpqS6Xww/gtfsdir/stops.txt': Invalid argument
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip", "stop_times")
#> Warning message:
#> In data.table::fread(file.path(temp_dir, file), nrows = 1) :
#>   Previous fread() session was not cleaned up properly. Cleaned up ok at the beginning of this fread() call.

Related data.table issue: #2904

@zoushucai
Copy link

I have also encountered this problem recently. Do you have any good solutions? I need to read multiple CSV files in turn. Some files need to be separated by other symbols according to the warning information instead of using the default double quotation marks

Thank you

@dhersz
Copy link

dhersz commented Oct 25, 2021

Hi @zoushucai, I started using withCallingHandlers() instead of tryCatch(), because it doesn't interrupt the running process. Perhaps it could be well suited for your needs as well.

@zoushucai
Copy link

Thank you for your suggestion. My problem has been solved @dhersz

@ldmax
Copy link

ldmax commented Jan 17, 2022

That's a neat idea! Would be adding tryCatch(..., finally=.Call(CfreadCleanup)) around the .Call(CfreadR) here I guess : https://github.com/Rdatatable/data.table/blob/master/R/fread.R#L101

Hi @mattdowle, could you please kindly point out how should I use

    ...
    finally=freadCleanup
    ...

or

    ...
    finally=.Call(CfreadCleanup)
    ...

Because every time I try above I'll get a 'C symbol name "freadCleanup" not in load table' error.
Thank you in advance!
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants