Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sparse summation and indexing #70

Merged
merged 3 commits into from
Aug 15, 2019

Conversation

dmuldrew
Copy link
Contributor

@dmuldrew dmuldrew commented Jul 3, 2019

We profiled our MOST simulation using the Matlab profiler and found that the majority of our computation time was due to indexed assignment and summation of sparse matrices.

Refactoring these functions with a new implementation modeled after this blog post:
https://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/
results in significant performance improvement for our model.

We have done some testing of the changes within the debugger and plan to add an automated test. Suggestions of the best way to do this would be welcome...

@rdzman
Copy link
Member

rdzman commented Jul 5, 2019

Thanks @dmuldrew. There are some automated tests in t_opf_model (currently failing one test) that cover at least some of the functionality. I don't remember off-hand how thorough it is.

If we can get a consistent speed-up in this code, that would be wonderful, especially for MOST. Why don't you see if you can track down the bug in params_quad_cost that's causing the failing test and meanwhile I'll try to find some time to wrap my mind around the details. I've done a fair amount of Matlab profiling, but it looks like I'm about to learn some new tricks. 😉

Also, could you remove the modification to scale_load.m since any change there would be handled as a separate issue? Thanks.

@dmuldrew
Copy link
Contributor Author

dmuldrew commented Jul 5, 2019

Sure, I’ll investigate the failing test error and make the requested changes...we’re seeing somewhere around a 80-90% speed up. In case it helps, here’s a shorter post which started my investigation down this path:
https://www.mathworks.com/matlabcentral/answers/203734-most-efficient-way-to-add-multiple-sparse-matrices-in-a-loop-in-matlab

@dmuldrew
Copy link
Contributor Author

dmuldrew commented Jul 8, 2019

I removed the scale_load.m change. I haven't been able to repro the travis-ci error yet... It seems to occur using Octave as a vertcat dimension mismatch error and you also mentioned it's within params_quad_cost.m

@rdzman
Copy link
Member

rdzman commented Jul 9, 2019

t_opf_model runs without any fatal errors, but it fails test 421 ...

ok 420 - om.params_quad_cost('qc', {2, 2}) : k
not ok 421 - om.params_quad_cost() : Q
    index              got             expected      abs(got - exp)
---------------  ----------------  ----------------  ----------------
         (1,1)               171               179                 8
         (4,1)               175               167                 8
         (2,2)                 0                 5                 5
         (3,2)                 3                 0                 3
         (5,2)                 5                 0                 5
         (7,2)                 0                 3                 3
         (2,3)                 2                 0                 2
         (3,3)                 0                 6                 6
         (6,3)                 6                 2                 4
         (1,4)                 5                 4                 1
    .
    .
    .

I'd assumed that the subsequent fatal error in t_opf_default() was caused by the same issue, but it looks like that one is related to params_lin_constraint().

.
.
.
ok 141 - DEFAULT : w/ignored angle difference limits : branch flow
ok 142 - DEFAULT : w/ignored angle difference limits : branch mu
Error using vertcat
Dimensions of matrices being concatenated are not consistent.

Error in opt_model/params_lin_constraint (line 113)
        I = vertcat(sparseInds{:,1});

Error in mipsopf_solver (line 82)
[A, l, u] = om.params_lin_constraint();

Error in opf_execute (line 86)
      [results, success, raw] = mipsopf_solver(om, mpopt);

Error in opf (line 232)
    [results, success, raw] = opf_execute(om, mpopt);

Error in runopf (line 75)
[r, success] = opf(casedata, mpopt);

Error in t_opf_default (line 340)
    r = runopf(mpc, mpopt);

Not sure yet what the issue is, but we should add some tests to t_opf_model() to catch it before fixing it.

@rdzman
Copy link
Member

rdzman commented Jul 9, 2019

Started looking more closely at the code and found both issues. I'll be pushing some updates shortly.

  1. In params_lin_constraints() when Ak has a single row with multiple non-zeros, find() returns row vectors instead of column vectors, hence the vertcat error.

  2. In params_quad_cost the code was not handling the case where Q was specified as a column vector (diagonal elements of a diagonal Q matrix). Also, since c is a dense vector, it's probably not worthwhile to convert it to sparse to use this technique.

@dmuldrew
Copy link
Contributor Author

dmuldrew commented Jul 9, 2019

Great to hear...our simulation doesn't hit these particular edge cases so it would have taken me longer to debug the error. Hopefully you're seeing a speedup with MOST as well?

@dmuldrew
Copy link
Contributor Author

dmuldrew commented Jul 9, 2019

I'm definitely seeing an speed improvement with previous sparse c vector implementation (0.5s vs 14s total time for 1150 function calls). Perhaps we might use the similar non-sparse accumarray function:
https://blogs.mathworks.com/loren/2008/02/20/under-appreciated-accumarray/
https://www.mathworks.com/help/matlab/ref/accumarray.html

Change in params_quad_cost:
subs = cell2mat(c_Inds(:,1:2));
values = cell2mat(c_Inds(:,3));
c = accumarray(subs, values, [nx,1]);

@rdzman
Copy link
Member

rdzman commented Jul 10, 2019

It turns out that I'm seeing much less improvement than I expected. For example, the MOST tests, which are admittedly small, show negligible change, except for t_most_w_ds.

In fact, I went through a few other functions (eval_nln_constraint, eval_nln_constraint_hess, eval_nln_cost, params_legacy_cost) and updated them to use the same technique, hoping for significant improvements for large AC OPF problems. But I found that the result was typically a small slow-down, rather than an improvement.

I suspect the only place where this technique is going to show significant improvement is for certain kinds of MOST cases where the current implementation uses a very large number of sparse additions.

In any case, would it be possible for you to send me privately the MOST case you're running? A .mat file with the mdi and mpopt values would be sufficient. I'd like to test things on a case that shows the kind of speedups you're seeing. I don't have a case handy where the time to set up the problem is significant relative to the time required to solve it.

So far t_most_w_ds is the best I've encountered. On my machine the setup time is cut from 12-13s down to 7-8s, with a solve time is around 12s. I'll check the sparse c vector thing again to see if I can observe the improvements you mention.

@dmuldrew
Copy link
Contributor Author

dmuldrew commented Jul 10, 2019

Yes, between the two implementations there does appear to be a run-time transition point. It appears to be a function of number of sparse matrices to be summed (and maybe density of the matrices?).

Here's a toy model I used to test this:
https://gist.github.com/dmuldrew/3c167b6bf3b598c7b4e5818e6b5ed7d6
ScalingGraph

I'll ask about the possibility of sharing a mdi or mpopt file.

@rdzman
Copy link
Member

rdzman commented Jul 10, 2019

Interesting! I haven't played with that code yet (and may not get a chance to anytime soon), but I wonder how the plot changes for smaller or larger values of N.

Also, any idea what the corresponding values of numLoops and N would be for your MOST runs?

@dmuldrew
Copy link
Contributor Author

dmuldrew commented Jul 10, 2019

For us, the value of om.lin.NS is ~1300. Individual Ak linear constraint matrices approximated as square matrices that have values N=4300 and density of 0.00023.

Plugging these parameters into the toy model gives:
numLoops: 1300
ConstructSparse: 1.3257 seconds
AddSparse: 43.1034 seconds

The transition point for these Ak is somewhere around numLoops ~40
RuntimeInflection
I'm guessing that the transition point occurs at lower numLoop values when the total # of nonzeros in the Ak matrices increases.

Code analyzer says it's faster.
@rdzman
Copy link
Member

rdzman commented Aug 15, 2019

I confess I had a hard time deciding when to use this new method and when not to. For large numbers of constraint and cost sets with large matrices, it's a clear win. However, when that's not the case, I found the timing to vary a lot.

In the end I went with the new method when the number of constraint or cost sets (number of loops) was greater than 100 (or greater than 25 if the matrix had more than 300 cols). The attached file (pr70.zip) has some data and code I used in the process (saved here for posterity).

I'll be re-basing my changes on the latest master and pushing them here momentarily.

dmuldrew and others added 2 commits August 15, 2019 16:37
- @opt_model/params_lin_constraint()
- @opt_model/params_quad_cost()

Should give major boost to setup phase for some MOST runs.

Note: Similar refactoring in the following functions does not result in improved performance:
- @opt_model/eval_nln_constraint()
- @opt_model/eval_nln_constraint_hess()
- @opt_model/eval_nln_cost()
- @opt_model/params_legacy_cost()

Ref: MATPOWER#70

 (+8 squashed commits)
Squashed commits:
[5a0723e] Fix several issues in new implementation of params_quad_cost()

- Q specified as column vector was not being handled
- varsets were only being used to order columns, but they need to order rows as well
- c is kept as dense vector
[8f12a17] Fix issue with single-row constraints causing vertcat dimension mismatch error.
[2fc302b] Include single row constraint in t_opf_model().

- Uncovers fatal error in new implementation of @opt_model/params_lin_constraint().
[4cffa8d] docs: added more comments to changed sections
[7cc64b6] refactor: update params_quad_cost.m with faster sparse matrix summations
[41ae817] refactor: update params_lin_constraint.m with faster sparse matrix summations
@rdzman rdzman merged commit ff015af into MATPOWER:master Aug 15, 2019
@dmuldrew
Copy link
Contributor Author

Glad to see this method incorporated into master and that the performance improvements will help others out:
MATPOWER/most#7

I'm planning to submit a second PR with a few other sparse matrix tweaks we're also using...

@lordleoo
Copy link

Glad to see this method incorporated into master and that the performance improvements will help others out:
MATPOWER/most#7

I'm planning to submit a second PR with a few other sparse matrix tweaks we're also using...

Thanks a lot; You're a life saver
I look forward to your future PRs. I can use any speed-up tweak I can get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants