-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable field-order-agnostic overloads of fromarrow
for struct types
#493
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #493 +/- ##
==========================================
- Coverage 87.45% 87.37% -0.08%
==========================================
Files 26 26
Lines 3283 3288 +5
==========================================
+ Hits 2871 2873 +2
- Misses 412 415 +3 ☔ View full report in Codecov by Sentry. |
be7a4fe
to
c572d7a
Compare
src/arraytypes/struct.jl
Outdated
@@ -33,23 +33,33 @@ isnamedtuple(T) = false | |||
istuple(::Type{<:Tuple}) = true | |||
istuple(T) = false | |||
|
|||
@propagate_inbounds function Base.getindex(s::Struct{T,S}, i::Integer) where {T,S} | |||
if isdefined(ArrowTypes, :StructElement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with this approach to avoid needing to change Arrow's compat bounds on ArrowTypes
Did some benchmarking, looks like there is a small perf hit here to the general case that needs to be mitigated before this can be merged: using Arrow, ArrowTypes, Random, BenchmarkTools
struct Foo
a::Int
b::String
c::Vector{String}
d::Float64
end
ArrowTypes.arrowname(::Type{Foo}) = Symbol("JuliaLang.Foo")
ArrowTypes.JuliaType(::Val{Symbol("JuliaLang.Foo")}, T) = Foo
genfoo() = Foo(rand(1:10), randstring(10), [randstring(10) for _ in 1:rand(2:5)], rand())
t = (; f = [genfoo() for _ in 1:1000])
f = Arrow.Table(Arrow.tobuffer(t)).f
@benchmark sum(x -> x.d, $f) results on
results on
|
Alright, I changed the approach here after profiling different candidate approaches. Looks like leaving on this PR now: BenchmarkTools.Trial: 8170 samples with 1 evaluation.
Range (min … max): 530.500 μs … 2.318 ms ┊ GC (min … max): 0.00% … 74.22%
Time (median): 547.833 μs ┊ GC (median): 0.00%
Time (mean ± σ): 610.966 μs ± 263.948 μs ┊ GC (mean ± σ): 9.75% ± 14.73%
█▆▃▁ ▁▃▁ ▁
████▇▇▅▃▅▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄███ █
530 μs Histogram: log(frequency) by time 1.75 ms <
Memory estimate: 2.22 MiB, allocs estimate: 18387. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I searched for Struct{
and found a usage in
Line 530 in 787768f
function arrowtype(b, x::Struct{T,S}) where {T,S} |
Generally looks good to me though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh also, the formatter wants this change, it looks like
diff --git a/test/runtests.jl b/test/runtests.jl
index 5210685..ed288b3 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -1027,7 +1027,11 @@ end
end
ArrowTypes.arrowname(::Type{Foo493}) = Symbol("JuliaLang.Foo493")
ArrowTypes.JuliaType(::Val{Symbol("JuliaLang.Foo493")}, T) = Foo493
- function ArrowTypes.fromarrowstruct(::Type{Foo493}, ::Val{fnames}, x...) where {fnames}
+ function ArrowTypes.fromarrowstruct(
+ ::Type{Foo493},
+ ::Val{fnames},
+ x...,
+ ) where {fnames}
nt = NamedTuple{fnames}(x)
return Foo493(nt.x + 1, nt.y + 1)
end
Co-authored-by: Eric Hanson <5846501+ericphanson@users.noreply.github.com>
…into jr/structelement
interesting. I think based on the docstring of
if the intent of the function was to yield the actual flatbuffer definition of the arrow data stored in the provided column, then actually we'd consider this method buggy on |
AFAICT this is good to merge - idk what the hanging "verify release" GHA check is doing but everything else is green 👍 |
@kou would you be able to help initiate a release for v2.7.0? |
thanks @ericphanson and @kou ! |
Sure! |
The vote thread: https://lists.apache.org/thread/dxx71lflxtt528rjco8fsjfl255bs628 |
@ericphanson Could you check my reply on |
Awesome, and I see you registered both package as well. Thanks very much! |
positional arguments; so if my custom type `Interval` has two fields `first` and `last`, then I'd overload like | ||
`ArrowTypes.fromarrow(::Type{Interval}, first, last) = ...`. Note the default implementation is | ||
`ArrowTypes.fromarrow(::Type{T}, x...) = T(x...)`, so if your type already accepts all arguments in a constructor | ||
no additional `fromarrow` method should be necessary (default struct constructors have this behavior). | ||
* Alternatively, may overload `fromarrowstruct(::Type{T}, ::Val{fnames}, x...)`, where `fnames` is a tuple of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be so slow in the review here; it might be worth adding another example the docs here like we have above, something like:
So taking the example from before, if my arrow data happens to have the `first` and `last` fields for my `Interval` type reversed, I could implement `fromarrowstruct(::Type{Interval}, ::Val{(last, first}), last, first) = Interval(first, last)`
I find concrete examples usually help drive home the core point of a new feature.
Motivated by beacon-biosignals/Legolas.jl#94 (comment)
Still requires:
@benchmark
ing the access in the test case from Nested schema deserialization depends on column order beacon-biosignals/Legolas.jl#94 didn't reveal any perf difference, which seems like a good sign