-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing transform field and creating TransformedKernel (and ScaledKernel) #32
Conversation
src/kernels/constant.jl
Outdated
|
||
ConstantKernel(t::Tr,c::Tc=1.0) where {Tr<:Transform,Tc<:Real} = ConstantKernel{Tr,Tc}(t,c) | ||
params(k::ConstantKernel) = (k.c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is params
(and opt_params
) supposed to return a tuple? This will just return a scalar value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just bad leftovers I did for something else. This is related to #31 . I will make a more thorough PR at some point.
@@ -0,0 +1,20 @@ | |||
struct ScaledKernel{Tk<:Kernel,Tσ<:Real} <: Kernel | |||
kernel::Tk | |||
σ::Vector{Tσ} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why is the scaling parameter a vector? Couldn't you just use
σ::Vector{Tσ} | |
σ::Tσ |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that it makes the types much more Zygote compatible. One can then optimize the parameters in place, by passing the reference of the parameter. A scalar would not allow this. One other option would be to use a Ref{Ts}
but I find this quite ugly in general!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm pretty sure that not using a vector should be completely fine with Zygote - at least I would be surprised if it's not: https://fluxml.ai/Zygote.jl/dev/#Structs-and-Types-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was not clear. It's actually to be more compatible with Flux
.
The whole implicit gradient approach allows you to save the reference to parameters (via an IdDict
and apply the gradients directly in place without having to build new objects. But this is only possible on mutable object so for a scalar it would not work. Or maybe I got everything wrong about Flux 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue with using a Ref
rather than a Vector
here? I would have thought that a reference to a Real
would be more readable than a Vector{<:Real}
that you have to know is always of length 1
?
Do we actually need an kappa(κ::TransformedKernel, d) = kappa(κ.kernel, d)
kappa(κ::TransformedKernel, x, y) = kappa(κ.kernel, κ.transform(x), κ.transform(y)) Maybe one could define some more convenient methods such as transform(f, k::Kernel) = TransformedKernel(k, f)
Base.:*(s::Real, k::Kernel) = ScaledKernel(k, s)
Base.:*(k::Kernel, s::Real) = s * k In general, I think it might be a bit confusing that TransformedKernel transforms the inputs whereas ScaledKernel transforms the evaluated kernel function. I'm not sure if this could be avoided by different or more explicit names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need an
IdentityTransform
?
Well yeah we could, the advantage so far is that it unifies everything (base kernels and transformed kernels)
Maybe one could define some more convenient methods
Why not for the first one although I start to feel we are overloading transform
with a lot of unrelated stuff.
The second one is already there
I don't think the third one is a good idea,
In general, I think it might be a bit confusing that TransformedKernel transforms the inputs whereas ScaledKernel transforms the evaluated kernel function. I'm not sure if this could be avoided by different or more explicit names.
Changing ScaledKernel
sounds like a good idea, any suggestions?
Oh I thought that could be the only implementation of |
Oh that's a perfectly fair point then! |
@@ -0,0 +1,13 @@ | |||
struct TransformedKernel{Tk<:Kernel,Tr<:Transform} <: Kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one could remove the Transform
type and allow arbitrary types and functions as transformations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to have wrapper around all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. Just a few things to sort out.
src/kernels/transformedkernel.jl
Outdated
transform::Tr | ||
end | ||
|
||
@inline transform(k::TransformedKernel) = k.transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a transform function anymore?
src/generic.jl
Outdated
@@ -1,5 +1,4 @@ | |||
@inline metric(κ::Kernel) = κ.metric | |||
|
|||
@inline transform(::Kernel) = IdentityTransform() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a transform function anymore?
src/kernels/kernelsum.jl
Outdated
kernels::Vector{Kernel} | ||
weights::Vector{Real} | ||
function KernelSum{Tr}(kernels::AbstractVector{<:Kernel},weights::AbstractVector{<:Real}) where {Tr} | ||
new{Tr}(kernels,weights) | ||
function KernelSum(kernels::AbstractVector{<:Kernel},weights::AbstractVector{<:Real}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably get rid of this constructor entirely
@@ -0,0 +1,20 @@ | |||
struct ScaledKernel{Tk<:Kernel,Tσ<:Real} <: Kernel | |||
kernel::Tk | |||
σ::Vector{Tσ} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue with using a Ref
rather than a Vector
here? I would have thought that a reference to a Real
would be more readable than a Vector{<:Real}
that you have to know is always of length 1
?
I went to make the changes to drop the need for |
About the
But the functions used to set values to a |
My suggestion would be to define # default implementation without considering transforms at all
function kernelmatrix!(
K::Matrix,
κ::Kernel,
X::AbstractMatrix;
obsdim::Int = defaultobs
)
@assert obsdim ∈ [1,2] "obsdim should be 1 or 2 (see docs of kernelmatrix))"
if !check_dims(K,X,X,feature_dim(obsdim),obsdim)
throw(DimensionMismatch("Dimensions of the target array K $(size(K)) are not consistent with X $(size(X))"))
end
map!(x->kappa(κ,x),K,pairwise(metric(κ),X,dims=obsdim))
end
# special implementation for TransformedKernel
function kernelmatrix!(
K::Matrix,
κ::TransformedKernel,
X::AbstractMatrix;
obsdim::Int = defaultobs
)
kernelmatrix!(K, κ.kernel, κ.transform(X, obsdim); obsdim = obsdim)
end |
The problem with this is that we end up having the exact problem we were trying to avoid! 😆
Would produce the exact same result. |
By the way, one should probably better use julia> struct S
s::Ref{Int}
end
julia> struct T
s::Base.RefValue{Int}
end
julia> T(x::Int) = T(Ref(x))
T
julia> f(x) = x.s[]
f (generic function with 1 method)
julia> a = S(1)
S(Base.RefValue{Int64}(1))
julia> b = T(1)
T(Base.RefValue{Int64}(1))
julia> @code_warntype f(a)
Variables
#self#::Core.Compiler.Const(f, false)
x::S
Body::Any
1 ─ %1 = Base.getproperty(x, :s)::Ref{Int64}
│ %2 = Base.getindex(%1)::Any
└── return %2
julia> @code_warntype f(b)
Variables
#self#::Core.Compiler.Const(f, false)
x::T
Body::Int64
1 ─ %1 = Base.getproperty(x, :s)::Base.RefValue{Int64}
│ %2 = Base.getindex(%1)::Int64
└── return %2 |
Having now thought about this a bit more, I'm not sure that there's any way around what @devmotion proposes (which is also what I had imagined happening). The reason being that, for example, AFAICT, the |
I thought about this a bit more as well, and not having a
I thought this PR tries to solve the problem that we have to carry around
IMO the amount of functions that have to be added is not unreasonably large - and at the same time it allows us to remove all uses of |
Ok, I will implement these changes then. |
No worries. No prizes for guessing what the deadline is though haha |
IMO it doesn't matter. It seems there's only an advantage in using |
I was playing around with this PR a bit since I would like to move (almost) all kernel-related functionality of my package to KernelFunctions. However, I noticed that the current implementation leads to a severe performance regression when using # using `exponentialkernel` (i.e. no `ScaleTransform`)
1.137 ms (0 allocations: 0 bytes)
# using `exponentialkernel(0.1)`
2.432 ms (39805 allocations: 6.07 MiB) Maybe one could add more efficient implementations such as function kappa(κ::TransformedKernel{<:ExponentialKernel,<:ScaleTransform}, x, y)
k = κ.kernel
s = κ.transform.s[]
kappa(k, s * evaluate(metric(k), x, y))
end (probably in a more modular/general way) for the default kernels? With this implementation and 1.009 ms (5 allocations: 80 bytes) By switching to a scalar field 1.002 ms (0 allocations: 0 bytes) |
When you say you use |
It's different estimators and I don't need all pairwise distances for all of them, so I'm evaluating just the combinations I need, e.g., in https://github.com/devmotion/CalibrationErrors.jl/blob/1cad3a183ea4c0f34758dd938dfd3b8cec393409/src/skce/generic.jl#L224-L228. The code is a bit out-dated, I haven't pushed my latest changes. |
Ah I understand now what you propose, you want to have specialized cases for some specific |
Yes, that was my idea. IMO the |
Ok I got a two liner :
|
I'm a bit worried about kernels with different metrics. Maybe we could dispatch on kappa(κ::TransformedKernel{<:Kernel,<:ScaleTransform}, x, y) = kappa(κ.kernel, _scale(κ.transform, metric(κ.kernel), x, y))
_scale(t::ScaleTransform, metric::Euclidean, x, y) = first(t.s) * evaluate(metric, x, y)
_scale(t::ScaleTransform, metric::Euclidean, x, y) = first(t.s)^2 * evaluate(metric, x, y)
_scale(t::ScaleTransform, metric, x, y) = evaluate(metric, apply(t, x), apply(t, y)) |
src/kernels/constant.jl
Outdated
|
||
metric(::ConstantKernel) = Delta() | ||
@inline metric(::ConstantKernel) = Delta() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side remark, IMO usually it's not required to add @inline
, in particular not for small functions such as metric
. In these cases, I think usually one can trust the compiler, as stated in the docstring:
help?> @inline
@inline
Give a hint to the compiler that this function is worth inlining.
Small functions typically do not need the @inline annotation, as the compiler does it automatically. By using @inline on bigger
functions, an extra nudge can be given to the compiler to inline it.
To add up on the massive changes of this PR :
|
|
||
### Syntactic sugar for creating matrices and using kernel functions | ||
for k in [:ExponentialKernel,:SqExponentialKernel,:GammaExponentialKernel,:MaternKernel,:Matern32Kernel,:Matern52Kernel,:LinearKernel,:PolynomialKernel,:ExponentiatedKernel,:ZeroKernel,:WhiteKernel,:ConstantKernel,:RationalQuadraticKernel,:GammaRationalQuadraticKernel] | ||
for k in subtypes(BaseKernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that does not add the syntactic sugar for user-defined BaseKernel
s - in the future (when support for Julia < 1.3 is dropped) this can be done in a nice way due to JuliaLang/julia#31916: one can just define
(κ::BaseKernel)(d::Real) = kappa(κ, d)
and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I did not know there was this new feature in 1.3.
I was thinking for now and as an alternative to create a macro that would do the same on a user-defined kernel. Including creating the TransformedKernel
wrapper
src/generic.jl
Outdated
## Constructors for kernels without parameters | ||
for kernel in [:ExponentialKernel,:SqExponentialKernel,:Matern32Kernel,:Matern52Kernel,:ExponentiatedKernel] | ||
for k in Symbol.(subtypes(BaseKernel)) | ||
k = Symbol(string(k)[17:end]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess slightly less confusing would be
for k in nameof.(subtypes(BaseKernel))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn I spent hours trying to look for a solution on this 😅
src/generic.jl
Outdated
for kernel in [:ExponentialKernel,:SqExponentialKernel,:Matern32Kernel,:Matern52Kernel,:ExponentiatedKernel] | ||
for k in Symbol.(subtypes(BaseKernel)) | ||
k = Symbol(string(k)[17:end]) | ||
new_k = Symbol(lowercase(string(k))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sold on the idea of using lowercase functions here but I'm also not fully convinced by using the uppercase k
either. IMO, the lowercase definitions add quite a bit of inconsistency, since now some kernels can be constructed with lowercase function whereas others don't, everyone who implements custom BaseKernel
s has to copy these definitions (and in contrast to the syntactic sugar above I don't see how this problem could ever be solved by KernelFunctions, even on Julia >= 1.3), and one can construct kernels now in two different ways (e.g., sqexponentialkernel()
and SqExponentialKernel()
) which feels weird. That's why I'm inclined to say, maybe one should rather implement the uppercase version, such as $k(ρ::Real; args...) = TransformedKernel($k(;args...),ScaleTransform(ρ))
- even though in this case it might come as a surprise that the return value is not of type $k
. On the other hand, that should be fine I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the real problem, this would create a very problematic type inconsistency...
For the user defined case, again I was thinking on making a macro. That would automatically create these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function wrap_kernel(k) #::KernelFunctions.BaseKernel
new_k = Symbol(lowercase(string(k)))
return @eval begin
(κ::$k)(d::Real) = kappa(κ,d) #TODO Add test
(κ::$k)(x::AbstractVector{<:Real}, y::AbstractVector{<:Real}) = kappa(κ, x, y)
(κ::$k)(X::AbstractMatrix{T}, Y::AbstractMatrix{T}; obsdim::Integer=defaultobs) where {T} = kernelmatrix(κ, X, Y, obsdim=obsdim)
(κ::$k)(X::AbstractMatrix{T}; obsdim::Integer=defaultobs) where {T} = kernelmatrix(κ, X, obsdim=obsdim)
$new_k(;args...) = $k(;args...)
$new_k(ρ::Real;args...) = TransformedKernel($k(;args...),ScaleTransform(ρ))
$new_k(ρ::AbstractVector{<:Real};args...) = TransformedKernel($k(;args...),ARDTransform(ρ))
$new_k(t::Transform;args...) = TransformedKernel($k(;args...),t)
export $new_k
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That (@eval
inside a function) does not feel right, IMO this is the perfect use case for a proper macro, i.e., macro wrap_kernel(T) ... end
where T
is the type of the custom user-defined BaseKernel
. That feels a lot cleaner to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on this on Slack right now . I am very bad at understanding macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should
@kernel s * Kernel(transform, kernel_params...)
be rewritten to? To
s * TransformedKernel(Kernel(kernel_params...), transform)
? I'm still wondering if this does not lead to problems in cases such as
@kernel s * MaternKernel(42)
It seems a bit unintuitive to assume that a user would actually want to use a length scale of 42
in this case instead of the default constructor of a MaternKernel
with parameter 42
. IMO in that example an explicit keyword argument such as lengthscale
would be less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should
@kernel s * Kernel(transform, kernel_params...)
be rewritten to? Tos * TransformedKernel(Kernel(kernel_params...), transform)
?
Yes exactly
It seems a bit unintuitive to assume that a user would actually want to use a length scale of 42 in this case instead of the default constructor of a MaternKernel with parameter 42
Hmm I dont understand your point, it seems as unintuitive to me a user would use a nu = 42 for MaternKernel. That's why I think using keywords for kernel parameters is necessary. It is even more evident for kernel having multiple parameters like PolynomialKernel
. What should be PolynomialKernel(2.0,3.0)
should be ? Whereas PolynomialKernel(c=2.0,d=3.0)
does not leave any doubt.
Having lengthscale
as keyword could be an additional way to do it. But I would be quite against having kernel parameters as positional parameters, (as it is now the case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I dont understand your point, it seems as unintuitive to me a user would use a nu = 42 for MaternKernel. That's why I think using keywords for kernel parameters is necessary.
Sure, using keyword arguments for kernel parameters might be more explicit and more intuitive, in particular in the case of kernels with multiple parameters. However, IMO it feels unintuitive and confusing if adding @kernel
changes completely what the 42 in MaternKernel(42)
means (even if one wants to encourage the use of keyword arguments, usually the default constructors still exist, so MaternKernel(42)
might be a completely valid call of the default outer constructor).
But I would be quite against having kernel parameters as positional parameters, (as it is now the case)
The problem I see is that you might be able to enforce that for the kernels defined in this package (by implementing an inner constructor that invalidates the default outer constructor and only adding an outer constructor with keyword arguments) but it is impossible to ensure that for user-defined kernels. Actually, I'm not even sure if it is a good idea to try to enforce that in the case of kernels with a single clearly defined parameter (such as maybe even MaternKernel
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually the default constructors still exist, so MaternKernel(42) might be a completely valid call of the default outer constructor
My thought was to change the default constructor as well.
it is impossible to ensure that for user-defined kernels.
We can be explicit about that in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will finish the PR without the @kernel
wrapper and remove all other wrappers (the lowercase thing). We can add @kernel
in a different PR. I will just correct the docs and add tests and merge the PR.
Following the discussion in #17 I removed the transform field (and the parametrization on it). Now all base kernels default on
IdentityTransform
.I created on top
TransformedKernel
which contains a transform and a kernel andScaledKernel
which contains a premultiplying scalar.Left to do is