-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal inference for incompletely specified named tuples #29970
Comments
This is due to the constructors being |
Yeah we're coming up against this one, too. It would be nice to infer the use I didn't follow your explanation, either. I would have guessed we would arrive at the |
OK, the reflection tools seem to be indicating that they consider the input types to be "abstract" or non-leaf types.
Would it be fair to say that this heuristic is a little outdated for simple |
This is not a heuristic. We decided not to ever invoke generators on abstract types, because for that to be sound they have to be monotonic: you have to make sure that your generated code infers to a subtype of what we infer in any case where the arguments are a supertype. Of course that is possible, but in practice it is very difficult to ensure, and you get a segfault if you get it wrong. |
Thanks for the explanation. So what is the best way forward in this case? As far as I can tell, the main problem with And the other option was to enable inference on the |
I wonder if the following MWE is this issue: define functions f(::Type{Union{Missing, T}}) where T <: Real = rand() < 0.5 ? zero(T) : missing
_fs(::Type{Tuple{}}) = ()
_fs(::Type{T}) where T <: Tuple = (f(Base.tuple_type_head(T)), _fs(Base.tuple_type_tail(T))...)
fs(::Type{NamedTuple{N,T}}) where {N,T} = NamedTuple{N,T}(_fs(T)) and apply as (output trimmed) julia> VERSION
v"1.1.0-DEV.671"
julia> T = NamedTuple{(:a, :b), Tuple{Union{Missing, Int}, Union{Missing, Float64}}}
NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Union{Missing, Float64}}}
julia> @code_warntype fs(T)
Body::Any
24 ─ %66 = (Core.tuple)(%32, %64)::Tuple{Union{Missing, Int64},Union{Missing, Float64}}
└─── goto #25 ││
25 ─ %68 = (NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Union{Missing, Float64}}})(%66)::Any
└─── return %68 │ Also, is the current workaround something like fs_fixed(::Type{NamedTuple{N,T}}) where {N,T} = NamedTuple{N,T}(_fs(T))::NamedTuple{N,T} |
Mostly fixes #34. I think I can live with waiting for a proper fix of JuliaLang/julia#29970 for more than 32 columns.
This is the best workaround I can come up with at short notice: let
for n in 1:32
Ts = [Symbol("T$i") for i in 1:n]
xs = [:(x[$i]) for i in 1:n]
NT = :(Core.NamedTuple{names, Tuple{$(Ts...)}})
eval(quote
$NT(x::Tuple{$(Ts...)}) where {names, $(Ts...)} = $(Expr(:new, NT, xs...))
end)
end
end It seems to work ok, in particular it fixes JuliaData/TypedTables.jl#34. Of course if I need more than 32 elements AND Would a workaround PR like this be welcome, or should we think of a more fundamental fix? (Another approach I explored was using the inner constructor as a thunk that makes a simple wrapper around the tuple to make it concrete, ending up calling a generated function that is not an inner constructor but calls EDIT: Hmm... my "workaround" seems to fix inference but generated code is horrible. |
Let me just throw in another MWE that seems to be caused by this: getfirstvals(a,b) = (a=a[1],b=b[1])
a=zeros(Union{Float64,Missing},10)
b=zeros(Union{Float64,Missing},10)
getfirstvals(a,b) This does not infer as well and any fix or workaround would be welcome. |
Fixed by @JeffBezanson in #30577. |
The workaround for JuliaLang/julia#29970 causes newer versions of Julia to hang in the precompilation step for TypedTables. Since JuliaLang/julia#29970 has been addressed in JuliaLang/julia#30577 the code is avoided in newer versions of Julia.
Version check workaround for JuliaLang/julia#29970
So I've been doing some research around issues that affect common data processing tasks that include julia> code_typed(f, Tuple{Union{Int64, Float64}})
1-element Array{Any,1}:
CodeInfo(
1 ─ %1 = Core.tuple(x)::Tuple{Union{Float64, Int64}}
│ %2 = (NamedTuple{(:x,),T} where T<:Tuple)(%1)::NamedTuple{(:x,),_A} where _A
└── return %2
) => NamedTuple{(:x,),_A} where _A which, while better than when originally reported on this issue, is still suboptimal: What would we need to allow this? It seems like a compiler/optimizer issue where we need to teach it how to recognize that this kind of scenario is preferable to infer. I'm happy to dive in if someone can given some guidance, or reproducible scenarios where this is an issue, or whatever I can do to help. |
At run time that will give either a NamedTuple of Int64 or of Float64, so inferring the type with the union would not be correct. This is a pretty fundamental problem. |
The issue boils down to the fact that named tuples are invariant, unlike tuples: julia> Tuple{Int64} <: Tuple{Number}
true
julia> NamedTuple{(:x,), Tuple{Int64}} <: NamedTuple{(:x,), Tuple{Number}}
false But it seems like it should be possible to infer the tighter return type NamedTuple{(:x,), Tuple{T}} where T <: Union{Int64,Float64} which would be correct. Not arguing whether it's reasonable to expect this or not, but it would be a correct but tighter return type that could be inferred. |
Yeah, I'm not sure if that would actually be helpful for later dataflow analysis though, since then any extraction from the namedtuple would be Thanks for the explanations everyone (including @vtjnash offline); I have a couple alternative approaches I'm going to try out as work-arounds for the issue here. |
@quinnj don't know if it helps, but in my specific use case I have been using this https://github.com/meggart/SentinelMissings.jl to get efficient iterable tables with missings. In my case missings are encoded through special values in my input data anyway, so simply wrapping them in a new number type helped making the table type stable. I know this is against the spirit of missings but may be useful for some use cases. |
Thanks for the link @meggart! That looks like a clever little package. That would actually immediately allow resolution of a recent CSV.jl issue. I think there are definitely some ideas like that to help in these scenarios. |
Making NamedTuples covariant wouldn't magically fix this, since then |
It's indeed unfortunate because Ok, since I've been thinking about this for the past two days straight, let me share some thoughts if only to get them out of my head onto proverbial github paper, warning that ideas may range from impossible to insane:
struct UnionValue{T, S}
x::Union{T, S}
end the idea here is to "first-class" the kind of field we have in
%38 = Core.tuple(%23, %37)::Tuple{Int64,Union{Missing, Float64}}
%39 = (NamedTuple{(:x, :y),T} where T<:Tuple)(%38)::NamedTuple{(:x, :y),_A} where _A it could do: %38 = Core.tuple(%23, %37)::Tuple{Int64,Union{Missing, Float64}}
%39 = (NamedTuple{(:x, :y),Tuple{Int64, T} where T<:Union{Missing, Float64}}})(%38)::NamedTuple{(:x, :y),Tuple{Int64, T} where T<:Union{Missing, Float64}}}}
function f(x::NamedTuple)
return (x=x.id + 1, y::?=x.salary * 2)
end
# or for Pair
function f(x::NamedTuple)
return Pair{Int, ?}(x.id, x.salary * 2)
end this gives a kind of SQL-parameter-binding feel w/ the use of Realistically, I think some way of allowing users to specify individual field types is probably the biggest bang for development buck; and maybe just coming up w/ an alternative API altogether that avoids the direct NamedTuple-to-NamedTuple projection and breaks the fields out individually before putting them back in the final NamedTuple. |
The
where |
Yeah, this is certainly interesting. Do you imagine this would just be extra "metadata" propagated by the compiler during codegen? Or some actual type like: struct TypedValue{T}
x::T
TypedValue(fieldtype, x) = new{fieldtype}(x)
end that was automatically wrapped by the compiler and used internally, that was somehow treated as a "unwrapped" value when used? |
The main issue with the UnionValue idea, which is similar to what many static languages call sum types but which aren’t actually sum types because they require explicit wrapping and unwrapping, is precisely that—we would also require explicit wrapping and unwrapping, which makes them inconvenient. In that respect, Julia’s union types are close to true sum types, except, of course that they don’t enforce disjointness (although that’s rarely actually an issue, except for a theoretical standpoint). If setting/getting a field/slot of such a type always did automatic wrapping/unwrapping but the type was concrete—I’m not even sure that’s coherent—that feels like some of what’s wanted. To some extent this seems like it’s about hoisting or lowering the selector bits outwards or inwards as necessary. The trouble now seems to be that sometimes we have the bits on an individual field and want them on the containing structure, selecting between different concrete layouts, and vice versa. And of course, in those situations you want a guarantee that the layouts are compatible or it’s not all that helpful. Also just thinking out loud on this. |
Yeah, my hope w/ the # something like
f(nt::NamedTuple{(:x,), Tuple{UnionValue{Int64, Missing}}}) = (x=nt.x + 1,)
# would expand in codegen to something like
function f(nt::NamedTuple{(:x,), Tuple{UnionValue{Int64, Missing}}})
x = nt.x
if x.value isa Missing
y = UnionValue{Int64, Missing}(x.value + 1)
else # x.value isa Int64
y = UnionValue{Int64, Missing}(x.value + 1)
end
return (x=y,)
end The part that then doesn't make a lot of sense in some ways is allowing function sin(x)
if x.value isa Int64
UnionValue{Int64, Missing}(sin(x.value))
else
UnionValue{Int64, Missing}(sin(x.value))
end
end but we don't really do/allow complete function call overloading like this. It makes me think that So maybe as an alternative to having f(nt) = (x=?(nt.x) + 1,) where I'm still not sure that completely solves things, because we still want the NamedTuple constructor to be happy a potential incoming Union field, so maybe we'd need |
This is an interesting line of thought related to #32448 (comment): it would be neat from an ABI standpoint if we hoisted all selector bits outward. In this way arrays of nested structs-and-unions-of-bits could become ABI compatible with C structs and unions when the parallel array of selector data is ignored. If we followed this rule, your
Yes I think so. Do you have an example of where always hoisting the selectors out would be a bad thing and we'd prefer the selector bits to be stored with the data? |
It would be nice if these could at least figure out that it's gonna be a
NamedTuple{(:x,), <:Any}
, or even if the second one distributed over the union.The text was updated successfully, but these errors were encountered: