-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AbstractKKTSystem structure #58
Conversation
function get_coo_to_dense(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} | ||
dense = Matrix{Float64}(undef,coo.m,coo.n) | ||
return dense, ()->copyto!(dense,coo) | ||
end | ||
|
||
copyto!(dense::Matrix{Tv},coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} = _copyto!(dense,coo.I,coo.J,coo.V) | ||
function Matrix{Tv}(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a proper constructor to build a dense matrix from a SparseMatrixCOO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the expected behavior of constructors "copying" the argument with a proper change in the type? SparseMatrixCSC
and Matrix
constructors seem to be just creating an object with matching size/sparsity.
Codecov Report
@@ Coverage Diff @@
## master #58 +/- ##
==========================================
- Coverage 87.30% 86.82% -0.49%
==========================================
Files 28 30 +2
Lines 3018 3150 +132
==========================================
+ Hits 2635 2735 +100
- Misses 383 415 +32
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @frapac, the PR looks awesome! Indeed AbstractKKTSystem can significantly enhance the modularity.
My only concern is the treatment for the "unreduced" system. I think your approach of treating it as an alternative KKT system formulation is the correct approach. But formulating it as a block 3x3 unreduced system requires some extra treatment. So, I'd suggest the following:
- Remove
reduced_system
option and let this be treated entirely by settingkkt_system
option. - We can make anything that ends with
_reduced!
(e.g.,set_aug_reduced!
) the default method. - Anything that ends with
_unreduced!
can be made as a special treatment for the unreduced system. For example, we can make
set_aug_unreduced!(...)
intoset_aug_reduced!(::SparseUnreducedKKTSystem, ...)
|
||
"Set scaling of Jacobian" | ||
function set_jacobian_scaling! end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
function get_coo_to_dense(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} | ||
dense = Matrix{Float64}(undef,coo.m,coo.n) | ||
return dense, ()->copyto!(dense,coo) | ||
end | ||
|
||
copyto!(dense::Matrix{Tv},coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} = _copyto!(dense,coo.I,coo.J,coo.V) | ||
function Matrix{Tv}(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the expected behavior of constructors "copying" the argument with a proper change in the type? SparseMatrixCSC
and Matrix
constructors seem to be just creating an object with matching size/sparsity.
2803dd3
to
f49bd36
Compare
* implement SparseReducedKKTSystem and SparseAugmentedKKTSystem * refactor Solver
286c3b4
to
19c09a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @frapac I found a few more things. Hope we can figure out the cause of the slow down soon 💪
src/MadNLP.jl
Outdated
include("nonlinearprogram.jl") | ||
include("matrixtools.jl") | ||
include(joinpath("LinearSolvers","linearsolvers.jl")) | ||
include(joinpath("kktsystem.jl")) | ||
include(joinpath("interiorpointsolver.jl")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can
include("kktsystem.jl")
include("interiorpointsolver.jl")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
src/kktsystem.jl
Outdated
|
||
function set_jacobian_scaling!(kkt::AbstractSparseKKTSystem, constraint_scaling::AbstractVector) | ||
nnzJ = length(kkt.jac) | ||
for i in 1:nnzJ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this for loop needs to be made as a separate function, since the function arguments are not concrete types and lots of the internal variables are referred to in the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, if the types are properly specified in kkt
, this should not be an issue as the compiler is able to infer everything properly.
If we look at the allocation we get in this function, we observe that we have indeed a problem:
- function set_jacobian_scaling!(kkt::AbstractSparseKKTSystem, constraint_scaling::AbstractVector)
0 nnzJ = length(kkt.jac)
128 for i in 1:nnzJ
0 index = kkt.jac_raw.I[i]
704 kkt.jacobian_scaling[i] = constraint_scaling[index]
- end
- end
If we look at it more closely, it appears that the types in SparseMatrixCOO
are not concrete:
mutable struct SparseMatrixCOO{Tv,Ti<:Integer} <: AbstractSparseMatrixCOO{Tv,Ti}
m::Int
n::Int
I::AbstractArray{Ti,1}
J::AbstractArray{Ti,1}
V::AbstractArray{Tv,1}
end
hence the compiler has some trouble to infer properly the type of kkt.jac_raw.I
...
@@ -28,6 +29,15 @@ function findIJ(S::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti} | |||
return I,J | |||
end | |||
|
|||
get_mapping(dest::AbstractMatrix, src::SparseMatrixCOO) = nothing | |||
|
|||
function diag!(dest::AbstractVector{T}, src::AbstractMatrix{T}) where T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is diag! used somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in the dense implementation (which lies in a separate branch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @frapac, I think I found the issue! Please check the comments and see if the suggested changes reduce the memory allocation
src/interiorpointsolver.jl
Outdated
# KKT system updates ------------------------------------------------------- | ||
# Set diagonal | ||
function set_aug_diagonal!(kkt::AbstractKKTSystem, ips::Solver) | ||
copyto!(kkt.pr_diag, ips.zl./(ips.x.-ips.xl) .+ ips.zu./(ips.xu.-ips.x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since julia evaluates ips.zl./(ips.x.-ips.xl) .+ ips.zu./(ips.xu.-ips.x)
first, it allocates memory first and then copy it to kkt.pr_diag. I recommend using .=
instead (which doesn't allocate memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! My bad: I did that previously only to support pr_diag
array allocated on the GPU, I should have reverted the change in the PR.
Indeed, fixing that reduces the allocation. Testing on hs033.nl
(using the script test/nlpmodel_tests.jl
), I get:
- Before
0.001986 seconds (2.94 k allocations: 609.129 KiB)
- Now
0.001987 seconds (2.94 k allocations: 413.418 KiB)
- Master (reference)
0.001582 seconds (2.78 k allocations: 389.809 KiB)
ips._w1zl.= 0. | ||
ips._w1zu.= 0. | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the functions in this section may allocate more memory than the original ones. The reason is that these functions are now taking non-conrete types (Solver and AbstractKKTSystem) as inputs, and julia compiler is not as efficient as the one works for concrete types. I recommend introducing another layer; e.g.,
set_aug_rhs_ifr!(ips::Solver, kkt::AbstractKKTSystem) = set_aug_rhs_reduced!(ips._w1x,ips._w1l,ips.c)
function set_aug_rhs_ifr_reduced!(_w1x,_w1l,c)
_w1x .= 0.0
_w1l .= .-c
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch, I understand my mistake now. Almost all the attributes in Solver
have non-concrete types ... (in fact, Cthulhu is reporting a lot of inference errors even on master). As the types are non-concrete, the compiler is forced to allocate in that kind of functions, as we are directly manipulating the attributes here.
Previously, we didn't have that issue, as the non-concrete types were passed to auxiliary functions, which were specializing afterwards.
I would suggest the following:
- For this PR, we implement the specialization using auxiliary functions, as you suggested
- I think afterwards, it would be worthwhile to revisit the type system in MadNLP (-> use concrete types in all structures): that would help a lot the compiler and we could reduce significantly the time to first solve IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes... that's my bad. At the time when I implemented MadNLP I didn't know this would have this consequence 😢
I agree with our approach. For now, let's keep the auxiliary functions, and we can later do a thorough revisit of type definitions in MadNLP. I can work on it once we merge this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a common pitfall in Julia: writing type stable code is definitely non-trivial. I am not sure the two languages problem was resolved by Julia ... Fortunately, the ecosystem is maturing to address this issue. Cthulhu is already quite useful to debunk type instability. I think also JET.jl is very promising: https://github.com/aviatesk/JET.jl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I guess we can start from the Cthulhu.jl and JET.jl reports
I have updated the code to force specialization on the faulty functions, to avoid unneeded allocations. On the instance
and we have now on this branch:
(we can compare only the allocations, as the comparison was made on different computers). In the medium-term, I would be in to revisit the type safety inside MadNLP. I think we could get some easy performance gain. |
@frapac 🎉 yes, that would be very nice |
So, it appears that I broke the tests, currently investigating what I did wrong ^^ |
d8955a5
to
935fa96
Compare
* deactive logs in benchmark scripts * benchmark: add verbose option to main script
cnt.eval_function_time += @elapsed nlp.obj_grad!(∇f,view(x,1:nlp.n)) | ||
∇f.*=ipp.obj_scale[] | ||
cnt.eval_function_time += @elapsed nlp.obj_grad!(f,view(x,1:nlp.n)) | ||
f.*=ipp.obj_scale[] | ||
cnt.obj_grad_cnt+=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was here; for some reason, sending ∇f
caused some extra allocations. Maybe the reason comes from CUTEst.jl. If the gradient callback checks the dimension of f
, the current change can be problematic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a similar reason, if we replace view(x,1:nlp.n)
with x
, allocations are significantly reduced. But if the gradient callback checks the input size, it can cause issue. (and if the callbacks are not properly implemented, it may make f
and x
corrupt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the issue was here:
https://github.com/JuliaSmoothOptimizers/CUTEst.jl/blob/main/src/julia_interface.jl#L105-L110
In order to make the callbacks C function compatible, CUTEst.jl internally converts the inputs to Vector{Float64}
. For "EIGMINA" in CUTEst, after allowing StrideOneVector{Float64} in the inputs, the allocation dramatically reduces:
from 1.213 MiB to 759.922 KiB.
I'll create a PR for CUTEst.jl. I think the benchmark result will look a lot better after this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very interesting. Good catch for the issue with CUTest.jl. I think indeed there is no need to create a new array if the input x
is one-strided ... I did this change because of the custom wrapper we developed for ExaPF: here, we need a gradient with the correct dimension (as we are using mul!
operation afterwards to compute the adjoint). I don't know what's the default: should we pass to the callbacks all arrays with the correct dimension, or should the callbacks do all the checks internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the latter would be ideal, but the performance of MadNLP shouldn't degrade too much even if the callback requires the correct dimension. I added a "buffered" option for NLPModels.jl interface so that there is a buffer vector that exactly matches the input dimensions for the callbacks. NLPModels.jl interface handles copying to/from the buffer vector. But if the user knows that there won'te be a problem, the user can turn this off.
I made a PR for CUTEst.jl (JuliaSmoothOptimizers/CUTEst.jl#269) to allow view inputs. If this gets merged, we can turn off buffered option for the benchmark.
Thanks for the help in merging this PR, @sshin23 ! |
First step towards support of dense NLP
Options
in a new filesrc/options.jl
Solver
: move all KKT structures in a new fieldkkt
Solver
: remove closures and implement everything in proper functionsSolver
: callbacks are now implemented as proper Julia function (better for compiler, and allow to use multiple dispatch in the future)