Add AbstractKKTSystem structure #58

frapac · 2021-08-20T13:47:45Z

First step towards support of dense NLP

implement SparseReducedKKTSystem and SparseAugmentedKKTSystem
move Options in a new file src/options.jl
Solver: move all KKT structures in a new field kkt
Solver: remove closures and implement everything in proper functions
Solver: callbacks are now implemented as proper Julia function (better for compiler, and allow to use multiple dispatch in the future)

src/kktsystem.jl

frapac · 2021-08-20T13:51:25Z

src/matrixtools.jl

 function get_coo_to_dense(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}
    dense = Matrix{Float64}(undef,coo.m,coo.n)
    return dense, ()->copyto!(dense,coo)
 end

-copyto!(dense::Matrix{Tv},coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} = _copyto!(dense,coo.I,coo.J,coo.V)
+function Matrix{Tv}(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}


I add a proper constructor to build a dense matrix from a SparseMatrixCOO

Aren't the expected behavior of constructors "copying" the argument with a proper change in the type? SparseMatrixCSC and Matrix constructors seem to be just creating an object with matching size/sparsity.

src/matrixtools.jl

src/options.jl

codecov · 2021-08-20T13:57:52Z

Codecov Report

Merging #58 (bc3e94a) into master (d31692a) will decrease coverage by 0.48%.
The diff coverage is 92.25%.

@@            Coverage Diff             @@
##           master      #58      +/-   ##
==========================================
- Coverage   87.30%   86.82%   -0.49%     
==========================================
  Files          28       30       +2     
  Lines        3018     3150     +132     
==========================================
+ Hits         2635     2735     +100     
- Misses        383      415      +32

Impacted Files	Coverage Δ
lib/MadNLPHSL/src/MadNLPHSL.jl	`80.00% <ø> (ø)`
lib/MadNLPHSL/src/ma27.jl	`76.19% <ø> (ø)`
lib/MadNLPHSL/src/ma77.jl	`88.15% <ø> (ø)`
src/MadNLP.jl	`62.50% <ø> (-25.00%)`	⬇️
src/matrixtools.jl	`82.60% <77.27%> (-17.40%)`	⬇️
src/Interfaces/MOI_interface.jl	`84.96% <85.71%> (-0.14%)`	⬇️
src/options.jl	`85.71% <85.71%> (ø)`
src/interiorpointsolver.jl	`93.48% <90.11%> (+0.03%)`	⬆️
src/Interfaces/NLPModels_interface.jl	`92.85% <90.90%> (-7.15%)`	⬇️
src/kktsystem.jl	`100.00% <100.00%> (ø)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d31692a...bc3e94a. Read the comment docs.

sshin23

Hey @frapac, the PR looks awesome! Indeed AbstractKKTSystem can significantly enhance the modularity.

My only concern is the treatment for the "unreduced" system. I think your approach of treating it as an alternative KKT system formulation is the correct approach. But formulating it as a block 3x3 unreduced system requires some extra treatment. So, I'd suggest the following:

Remove reduced_system option and let this be treated entirely by setting kkt_system option.
We can make anything that ends with _reduced! (e.g., set_aug_reduced!) the default method.
Anything that ends with _unreduced! can be made as a special treatment for the unreduced system. For example, we can make
set_aug_unreduced!(...) into set_aug_reduced!(::SparseUnreducedKKTSystem, ...)

src/options.jl

src/kktsystem.jl

src/options.jl

src/kktsystem.jl

sshin23 · 2021-08-20T14:43:57Z

src/kktsystem.jl

+
+"Set scaling of Jacobian"
+function set_jacobian_scaling! end
+


sshin23 · 2021-08-20T14:59:53Z

src/matrixtools.jl

 function get_coo_to_dense(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}
    dense = Matrix{Float64}(undef,coo.m,coo.n)
    return dense, ()->copyto!(dense,coo)
 end

-copyto!(dense::Matrix{Tv},coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer} = _copyto!(dense,coo.I,coo.J,coo.V)
+function Matrix{Tv}(coo::SparseMatrixCOO{Tv,Ti}) where {Tv,Ti<:Integer}


Aren't the expected behavior of constructors "copying" the argument with a proper change in the type? SparseMatrixCSC and Matrix constructors seem to be just creating an object with matching size/sparsity.

* implement SparseReducedKKTSystem and SparseAugmentedKKTSystem * refactor Solver

sshin23

Hey, @frapac I found a few more things. Hope we can figure out the cause of the slow down soon 💪

sshin23 · 2021-08-20T15:25:10Z

src/MadNLP.jl

 include("nonlinearprogram.jl")
 include("matrixtools.jl")
 include(joinpath("LinearSolvers","linearsolvers.jl"))
+include(joinpath("kktsystem.jl"))
 include(joinpath("interiorpointsolver.jl"))


Maybe we can

include("kktsystem.jl") include("interiorpointsolver.jl")

Good catch!

sshin23 · 2021-08-26T05:06:19Z

src/kktsystem.jl

+
+function set_jacobian_scaling!(kkt::AbstractSparseKKTSystem, constraint_scaling::AbstractVector)
+    nnzJ = length(kkt.jac)
+    for i in 1:nnzJ


maybe this for loop needs to be made as a separate function, since the function arguments are not concrete types and lots of the internal variables are referred to in the loop

Normally, if the types are properly specified in kkt, this should not be an issue as the compiler is able to infer everything properly.
If we look at the allocation we get in this function, we observe that we have indeed a problem:

- function set_jacobian_scaling!(kkt::AbstractSparseKKTSystem, constraint_scaling::AbstractVector) 0 nnzJ = length(kkt.jac) 128 for i in 1:nnzJ 0 index = kkt.jac_raw.I[i] 704 kkt.jacobian_scaling[i] = constraint_scaling[index] - end - end

If we look at it more closely, it appears that the types in SparseMatrixCOO are not concrete:

mutable struct SparseMatrixCOO{Tv,Ti<:Integer} <: AbstractSparseMatrixCOO{Tv,Ti} m::Int n::Int I::AbstractArray{Ti,1} J::AbstractArray{Ti,1} V::AbstractArray{Tv,1} end

hence the compiler has some trouble to infer properly the type of kkt.jac_raw.I ...

sshin23 · 2021-08-26T05:08:57Z

src/matrixtools.jl

@@ -28,6 +29,15 @@ function findIJ(S::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti}
    return I,J
 end

+get_mapping(dest::AbstractMatrix, src::SparseMatrixCOO) = nothing
+
+function diag!(dest::AbstractVector{T}, src::AbstractMatrix{T}) where T


Is diag! used somewhere?

It is used in the dense implementation (which lies in a separate branch).

sshin23

Hey @frapac, I think I found the issue! Please check the comments and see if the suggested changes reduce the memory allocation

sshin23 · 2021-08-27T04:14:52Z

src/interiorpointsolver.jl

+# KKT system updates -------------------------------------------------------
+# Set diagonal
+function set_aug_diagonal!(kkt::AbstractKKTSystem, ips::Solver)
+    copyto!(kkt.pr_diag, ips.zl./(ips.x.-ips.xl) .+ ips.zu./(ips.xu.-ips.x))


Since julia evaluates ips.zl./(ips.x.-ips.xl) .+ ips.zu./(ips.xu.-ips.x) first, it allocates memory first and then copy it to kkt.pr_diag. I recommend using .= instead (which doesn't allocate memory).

Good catch! My bad: I did that previously only to support pr_diag array allocated on the GPU, I should have reverted the change in the PR.
Indeed, fixing that reduces the allocation. Testing on hs033.nl (using the script test/nlpmodel_tests.jl), I get:

Before

0.001986 seconds (2.94 k allocations: 609.129 KiB)

Now

0.001987 seconds (2.94 k allocations: 413.418 KiB)

Master (reference)

0.001582 seconds (2.78 k allocations: 389.809 KiB)

sshin23 · 2021-08-27T04:21:08Z

src/interiorpointsolver.jl

+    ips._w1zl.= 0.
+    ips._w1zu.= 0.
+end
+


It seems like the functions in this section may allocate more memory than the original ones. The reason is that these functions are now taking non-conrete types (Solver and AbstractKKTSystem) as inputs, and julia compiler is not as efficient as the one works for concrete types. I recommend introducing another layer; e.g.,

set_aug_rhs_ifr!(ips::Solver, kkt::AbstractKKTSystem) = set_aug_rhs_reduced!(ips._w1x,ips._w1l,ips.c) function set_aug_rhs_ifr_reduced!(_w1x,_w1l,c) _w1x .= 0.0 _w1l .= .-c end

That's a good catch, I understand my mistake now. Almost all the attributes in Solver have non-concrete types ... (in fact, Cthulhu is reporting a lot of inference errors even on master). As the types are non-concrete, the compiler is forced to allocate in that kind of functions, as we are directly manipulating the attributes here.

Previously, we didn't have that issue, as the non-concrete types were passed to auxiliary functions, which were specializing afterwards.

I would suggest the following:

For this PR, we implement the specialization using auxiliary functions, as you suggested

I think afterwards, it would be worthwhile to revisit the type system in MadNLP (-> use concrete types in all structures): that would help a lot the compiler and we could reduce significantly the time to first solve IMO.

yes... that's my bad. At the time when I implemented MadNLP I didn't know this would have this consequence 😢

I agree with our approach. For now, let's keep the auxiliary functions, and we can later do a thorough revisit of type definitions in MadNLP. I can work on it once we merge this PR.

I think it's a common pitfall in Julia: writing type stable code is definitely non-trivial. I am not sure the two languages problem was resolved by Julia ... Fortunately, the ecosystem is maturing to address this issue. Cthulhu is already quite useful to debunk type instability. I think also JET.jl is very promising: https://github.com/aviatesk/JET.jl

Cool, I guess we can start from the Cthulhu.jl and JET.jl reports

frapac · 2021-08-27T13:16:45Z

I have updated the code to force specialization on the faulty functions, to avoid unneeded allocations.

On the instance hs033.jl, we have on master:

# Reduced KKT
  0.001334 seconds (2.78 k allocations: 390.355 KiB)
# Unreduced KKT
 0.004283 seconds (5.07 k allocations: 1.046 MiB)

and we have now on this branch:

# Reduced KKT
   0.001709 seconds (2.58 k allocations: 383.055 KiB)  
# Unreduced KKT
  0.003202 seconds (4.74 k allocations: 1.036 MiB)

(we can compare only the allocations, as the comparison was made on different computers).

In the medium-term, I would be in to revisit the type safety inside MadNLP. I think we could get some easy performance gain.

sshin23 · 2021-08-27T13:28:11Z

@frapac 🎉 yes, that would be very nice

frapac · 2021-08-27T13:31:54Z

So, it appears that I broke the tests, currently investigating what I did wrong ^^

* deactive logs in benchmark scripts * benchmark: add verbose option to main script

sshin23 · 2021-08-28T05:49:31Z

🎉 benchmark cutest master,current,ipopt --quick --gcoff

sshin23 · 2021-08-28T12:55:10Z

src/interiorpointsolver.jl

-    cnt.eval_function_time += @elapsed nlp.obj_grad!(∇f,view(x,1:nlp.n))
-    ∇f.*=ipp.obj_scale[]
+    cnt.eval_function_time += @elapsed nlp.obj_grad!(f,view(x,1:nlp.n))
+    f.*=ipp.obj_scale[]
    cnt.obj_grad_cnt+=1


The issue was here; for some reason, sending ∇f caused some extra allocations. Maybe the reason comes from CUTEst.jl. If the gradient callback checks the dimension of f, the current change can be problematic

For a similar reason, if we replace view(x,1:nlp.n) with x, allocations are significantly reduced. But if the gradient callback checks the input size, it can cause issue. (and if the callbacks are not properly implemented, it may make f and x corrupt)

Indeed, the issue was here:
https://github.com/JuliaSmoothOptimizers/CUTEst.jl/blob/main/src/julia_interface.jl#L105-L110

In order to make the callbacks C function compatible, CUTEst.jl internally converts the inputs to Vector{Float64}. For "EIGMINA" in CUTEst, after allowing StrideOneVector{Float64} in the inputs, the allocation dramatically reduces:
from 1.213 MiB to 759.922 KiB.

I'll create a PR for CUTEst.jl. I think the benchmark result will look a lot better after this...

That's very interesting. Good catch for the issue with CUTest.jl. I think indeed there is no need to create a new array if the input x is one-strided ... I did this change because of the custom wrapper we developed for ExaPF: here, we need a gradient with the correct dimension (as we are using mul! operation afterwards to compute the adjoint). I don't know what's the default: should we pass to the callbacks all arrays with the correct dimension, or should the callbacks do all the checks internally?

I think the latter would be ideal, but the performance of MadNLP shouldn't degrade too much even if the callback requires the correct dimension. I added a "buffered" option for NLPModels.jl interface so that there is a buffer vector that exactly matches the input dimensions for the callbacks. NLPModels.jl interface handles copying to/from the buffer vector. But if the user knows that there won'te be a problem, the user can turn this off.

I made a PR for CUTEst.jl (JuliaSmoothOptimizers/CUTEst.jl#269) to allow view inputs. If this gets merged, we can turn off buffered option for the benchmark.

sshin23 · 2021-08-29T17:19:33Z

Benchmark results:
time
memory

frapac · 2021-08-29T18:35:55Z

Thanks for the help in merging this PR, @sshin23 !

frapac requested a review from sshin23 August 20, 2021 13:47

frapac commented Aug 20, 2021

View reviewed changes

src/kktsystem.jl Outdated Show resolved Hide resolved

frapac commented Aug 20, 2021

View reviewed changes

src/kktsystem.jl Outdated Show resolved Hide resolved

frapac commented Aug 20, 2021

View reviewed changes

src/matrixtools.jl Show resolved Hide resolved

frapac commented Aug 20, 2021

View reviewed changes

src/options.jl Outdated Show resolved Hide resolved

sshin23 requested changes Aug 20, 2021

View reviewed changes

frapac force-pushed the fp/sparsekktsystem branch from 2803dd3 to f49bd36 Compare August 23, 2021 09:01

Add AbstractKKTSystem structure

19c09a5

* implement SparseReducedKKTSystem and SparseAugmentedKKTSystem * refactor Solver

frapac force-pushed the fp/sparsekktsystem branch from 286c3b4 to 19c09a5 Compare August 24, 2021 16:45

sshin23 reviewed Aug 26, 2021

View reviewed changes

sshin23 reviewed Aug 27, 2021

View reviewed changes

sshin23 mentioned this pull request Aug 27, 2021

Revisiting type definitions #62

Closed

sshin23 approved these changes Aug 27, 2021

View reviewed changes

Avoid unecessary allocations by forcing specialization

935fa96

frapac force-pushed the fp/sparsekktsystem branch from d8955a5 to 935fa96 Compare August 27, 2021 13:38

frapac and others added 4 commits August 27, 2021 21:15

Deactive logs in benchmark scripts (MadNLP#59)

8782730

* deactive logs in benchmark scripts * benchmark: add verbose option to main script

barrier iterations (MadNLP#61)

d9f2d61

benchmark improvement (MadNLP#60)

9a1ba49

allocation issue fixed

215dba0

sshin23 reviewed Aug 28, 2021

View reviewed changes

sshin23 added 4 commits August 28, 2021 21:31

added option buffered for NLPModels.jl

b00f17e

added option buffered for NLPModels.jl

5201d11

added option buffered for NLPModels.jl

edbbebd

ma27 fix

bc3e94a

sshin23 merged commit 0a471d4 into MadNLP:master Aug 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AbstractKKTSystem structure #58

Add AbstractKKTSystem structure #58

frapac commented Aug 20, 2021

frapac Aug 20, 2021

sshin23 Aug 20, 2021

codecov bot commented Aug 20, 2021 •

edited

Loading

sshin23 left a comment

sshin23 Aug 20, 2021

sshin23 Aug 20, 2021

sshin23 left a comment

sshin23 Aug 20, 2021

frapac Aug 26, 2021

sshin23 Aug 26, 2021

frapac Aug 27, 2021

sshin23 Aug 26, 2021

frapac Aug 26, 2021

sshin23 left a comment

sshin23 Aug 27, 2021

frapac Aug 27, 2021

sshin23 Aug 27, 2021

frapac Aug 27, 2021

sshin23 Aug 27, 2021

frapac Aug 27, 2021

sshin23 Aug 27, 2021

frapac commented Aug 27, 2021

sshin23 commented Aug 27, 2021

frapac commented Aug 27, 2021

sshin23 commented Aug 28, 2021

sshin23 Aug 28, 2021

sshin23 Aug 28, 2021 •

edited

Loading

sshin23 Aug 28, 2021 •

edited

Loading

frapac Aug 28, 2021

sshin23 Aug 29, 2021

sshin23 commented Aug 29, 2021

frapac commented Aug 29, 2021

Add AbstractKKTSystem structure #58

Add AbstractKKTSystem structure #58

Conversation

frapac commented Aug 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 20, 2021 • edited Loading

Codecov Report

sshin23 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sshin23 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sshin23 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frapac commented Aug 27, 2021

sshin23 commented Aug 27, 2021

frapac commented Aug 27, 2021

sshin23 commented Aug 28, 2021

Choose a reason for hiding this comment

sshin23 Aug 28, 2021 • edited Loading

Choose a reason for hiding this comment

sshin23 Aug 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sshin23 commented Aug 29, 2021

frapac commented Aug 29, 2021

codecov bot commented Aug 20, 2021 •

edited

Loading

sshin23 Aug 28, 2021 •

edited

Loading

sshin23 Aug 28, 2021 •

edited

Loading