-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance improvement #64
Conversation
Codecov Report
@@ Coverage Diff @@
## master #64 +/- ##
==========================================
- Coverage 86.88% 86.58% -0.31%
==========================================
Files 30 30
Lines 3150 3109 -41
==========================================
- Hits 2737 2692 -45
- Misses 413 417 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It overall looks good to me. I am just not sure to understand why we get rid of the SparseUnreducedKKTSystem
here? Was it really performance detrimental?
About the @simd
macro, I am wondering if in the long term we could incorporate a proper package for SIMD vectorization in MadNLP. E.g.
src/interiorpointsolver.jl
Outdated
@@ -709,9 +690,8 @@ function regular!(ips::AbstractInteriorPointSolver) | |||
switching_condition = is_switching(varphi_d,ips.alpha,ips.opt.s_phi,ips.opt.delta,2.,ips.opt.s_theta) | |||
armijo_condition = false | |||
while true | |||
ips.x_trial .= ips.x .+ ips.alpha.*ips.dx | |||
apply_step!(ips.x_trial,ips.x,ips.dx,ips.alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using a new function apply_step!
, how about using directly a BLAS function?
copyto!(ips.x_trial, ips.x)
axpy!(ips.alpha, ips.dx, ips.x_trial)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using BLAS makes more sense 👍
src/interiorpointsolver.jl
Outdated
ips.l.+=ips.alpha.*ips.dl | ||
ips.zl_r.+=ips.alpha_z.*ips.dzl | ||
ips.zu_r.+=ips.alpha_z.*ips.dzu | ||
apply_step!(ips.l,ips.l,ips.dl,ips.alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
axpy!(ips.alpha, ips.dl, ips.l)
_set_aug_diagonal_unreduced!(kkt.pr_diag, kkt.du_diag, kkt.l_lower, kkt.u_lower, kkt.l_diag, kkt.u_diag, | ||
ips.zl_r, ips.zu_r, ips.xl_r, ips.xu_r, ips.x_lr, ips.x_ur, | ||
) | ||
kkt.pr_diag .= ips.zl./(ips.x.-ips.xl) .+ ips.zu./(ips.xu.-ips.x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that we can get rid of the auxiliary function!
xll = x_lr[i]-xl_r[i] | ||
@inbounds xll < 0 && return Inf | ||
@inbounds varphi -= mu*log(xll) | ||
@simd for i=1:length(x_lr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we observe a difference when using @simd
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite sure, I think that will need a separate benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me now. Nice that we can remove the auxiliary functions we implemented in #58 !
I will start working on implementing the KKT right-hand-side structure once this PR merged in master.
In this PR, we make several changes for performance improvement.