-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version check for serialized data headers #35376
Conversation
Check that the serialization data format is compatible before attempting to read a serialized stream. New versions of Serialization are assumed to be able to read old serialized data, but attempting to read newer data with an older version of Serialization will fail with an error.
error("Unknown endianness flag in header") | ||
# Check protocol compatibility. | ||
endian_bom == ENDIAN_BOM || error("Serialized byte order mismatch ($(repr(endian_bom)))") | ||
wordsize == sizeof(Int) || error("Serialized word size mismatch ($wordsize)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure these should be errors; in many cases data will be readable moving between a 32-bit and 64-bit machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the endianness being mismatched will break nearly everything unless we handle it explicitly? And we don't try to handle it right now AFAICT.
But quite true regarding sizeof(Int)
as it's explicitly encoded as Int32
and Int64
and should deserialize ok at the other end. Should it be a warning perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a couple of quick tests. It turns out that that removing the wordsize
check does allow very simple things like homogenous arrays of integers to be deserialized between 32 and 64 bit Julia versions. So that can be helpful.
Once things get slightly complicated things start to break. For example serialized arrays of type Vector{Tuple{Int32,Int64}}
are not portable due to platform ABI differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @c42f, @JeffBezanson, this 32/64 cross-architecture support is very useful to me. I have a cluster of 32-bit ARM sensor/control boards talking to a 64-bit intel master. I can't use addprocs()
because of the architecture mismatch, but I've rolled my own simple RPC based on serialize(io, eval(deserialize(io)))
. It works nicely for my use case with simple data types (and seems vastly preferable to resorting to MsgPack.jl or ProtoBuf.jl)
It seems like it would be useful to have a serialize_compat
function that guarantees to produce cross-architecture / cross-julia-version output. It could initially just throw an ArgumentError for input that isn't compatible. Support for more complex types could be added as needed.
I can see that it makes sense to keep serialize
efficiency-focussed and keep the constraint of "must have same system image".
Do you think it would be generally useful to expose a subset of serialize
with a compatibility guarantee?
Test failure is UnicodeData.txt |
Ok, I've done the wordsize change as discussed, and tests have passed so I'll merge this small change. Hopefully this will reduce confusion for people maintaining Julia on ad-hoc clusters. |
Check that the serialization data format is compatible before attempting to read a serialized stream. New versions of Serialization are assumed to be able to read old serialized data, but attempting to read newer data with an older version of Serialization will fail with an error.
Check that the serialization data format is compatible before attempting to read a serialized stream. New versions of Serialization are assumed to be able to read old serialized data, but attempting to read newer data with an older version of Serialization will fail with an error.
Check that the serialization data format is compatible before attempting to read a serialized stream. New versions of Serialization are assumed to be able to read old serialized data, but attempting to read newer data with an older version of Serialization will fail with an error.
This code was written as part of an alternative solution to patching up
CodeInfo
serialization in #35371, but seems like it could be useful: it should prevent confusion if a newer julia versions somehow ends up sending data to older workers.