Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid argument error #336

Closed
w-zhiwei opened this issue Sep 7, 2022 · 8 comments · Fixed by #469
Closed

Invalid argument error #336

w-zhiwei opened this issue Sep 7, 2022 · 8 comments · Fixed by #469

Comments

@w-zhiwei
Copy link

w-zhiwei commented Sep 7, 2022

If you try to save a loaded table into the same file, it will lead to an invalid argument error.

Seems like it's caused by mmap on windows. See JuliaData/CSV.jl#170.

using Arrow
using DataFrames

df = DataFrame(rand(100, 100), :auto)
Arrow.write("test.arrow", df)

df = Arrow.Table("test.arrow")
Arrow.write("test.arrow", df)

The last line will raise an error.

ERROR: SystemError: opening file "test.arrow": Invalid argument
Stacktrace:
  [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
    @ Base .\error.jl:174
  [2] #systemerror#68
    @ .\error.jl:173 [inlined]
  [3] systemerror
    @ .\error.jl:173 [inlined]
  [4] open(fname::String; lock::Bool, read::Nothing, write::Nothing, create::Nothing, truncate::Bool, append::Nothing)
    @ Base .\iostream.jl:293
  [5] open(fname::String, mode::String; lock::Bool)
    @ Base .\iostream.jl:355
  [6] open(fname::String, mode::String)
    @ Base .\iostream.jl:355
  [7] open(::Arrow.var"#116#117"{Nothing, Nothing, Bool, Nothing, Bool, Bool, Bool, Int64, Int64, Float64, Bool, Arrow.Table}, ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base .\io.jl:328
  [8] open(::Function, ::String, ::String)
    @ Base .\io.jl:328
  [9] #write#115
    @ C:\Users\R9000K\.julia\packages\Arrow\SFb8h\src\write.jl:57 [inlined]
 [10] write(file_path::String, tbl::Arrow.Table)
    @ Arrow C:\Users\R9000K\.julia\packages\Arrow\SFb8h\src\write.jl:57
 [11] top-level scope
    @ Untitled-1:8

However, it works when saved under a different file name other than the original one.

Arrow.write("test1.arrow", df)
@jeremiedb
Copy link

I can can reproduce here on Windows. Cannot write arrow to the same filepath which was previously read from.

@jeremiedb
Copy link

jeremiedb commented Nov 24, 2022

The issue wan't OS / Windows specific as the above example crashed Julia session on Ubuntu.
In order to write to the same path that was read from, a copy looks necessary. For example, the following does work:

using Arrow
using DataFrames

df = DataFrame(rand(100, 100), :auto)
Arrow.write("test.arrow", df)
df = copy(DataFrame(Arrow.Table("test.arrow")))
Arrow.write("test.arrow", df)

@bkamins
Copy link
Contributor

bkamins commented Nov 25, 2022

df = Arrow.Table("test.arrow") does memory mapping, so this is expected. The difference between OSes might be due to how file locking when doing mmapping is handled.

@w-zhiwei
Copy link
Author

I forgot to copy the mmapped table in my sample code.

As @jeremiedb mentioned, you can overwrite the loaded arrow file on Linux if it's copied, but the same code will raise an error on Windows. A temporary workaround from JuliaData/CSV.jl#170 is to use GC.gc() before saving it.

@w-zhiwei
Copy link
Author

w-zhiwei commented Dec 2, 2022

Just found out if you're using copy(DataFrame(Arrow.Table("test.arrow"))[:, :]) instead of copy(DataFrame(Arrow.Table("test.arrow")), the loaded file can now be overwritten on windows. I dunno why these two have different behaviors.

@bkamins
Copy link
Contributor

bkamins commented Dec 3, 2022

@TanookiToad - this is strange copy(df) and df[:, :] are almost the same (their only difference is how metadata is handled but it should not affect the result here)

@w-zhiwei
Copy link
Author

w-zhiwei commented Dec 3, 2022

@TanookiToad - this is strange copy(df) and df[:, :] are almost the same (their only difference is how metadata is handled but it should not affect the result here)

Yeah. I was wrong about that. There's actually no difference in copy(df) vs df[:, :] on this issue. Actually after more tests, it becomes pretty cofusing to me that only sometimes loaded data can be rewritten on Windows.

For example, the following code will work if "test.arrow" is constructed using DataFrame(rand(10000, 1000), :auto), but it will raise an error if it's a smaller data like DataFrame(rand(100, 100), :auto) or a larger data like DataFrame(rand(10000, 10000), :auto)

df = copy(DataFrame(Arrow.Table("test.arrow")))
Arrow.write("test.arrow", df)

I've tested it with Julia v1.6.7, v1.7.2 and v1.8.2 (64-bit) on Win11 2H22. All of them have the same results.

quinnj added a commit that referenced this issue Jun 13, 2023
Should fix #336.

For more context, see the [same fix](JuliaData/CSV.jl@077e177)
we made for this in CSV.jl.

Basically, objects interpolated into or returned from spawned tasks can
be unexpectedly kept alive long after the task has finished and the object
should have been garbage-collected due to individual threads holding
the most recent task as a reference. Using `@wkspawn` ensures tasks themselves
don't hold references to any of these once they're done executing.
@quinnj
Copy link
Member

quinnj commented Jun 13, 2023

Fix for this is up: #469. Sorry for the slow response here, but it would be great if anyone on windows could confirm that the original issue is fixed w/ that PR.

quinnj added a commit that referenced this issue Jun 14, 2023
Should fix #336.

For more context, see the [same
fix](JuliaData/CSV.jl@077e177)
we made for this in CSV.jl.

Basically, objects interpolated into or returned from spawned tasks can
be unexpectedly kept alive long after the task has finished and the
object should have been garbage-collected due to individual threads
holding the most recent task as a reference. Using `@wkspawn` ensures
tasks themselves don't hold references to any of these once they're done
executing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants