Refactor sparse summation and indexing #70

dmuldrew · 2019-07-03T23:11:17Z

We profiled our MOST simulation using the Matlab profiler and found that the majority of our computation time was due to indexed assignment and summation of sparse matrices.

Refactoring these functions with a new implementation modeled after this blog post:
https://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/
results in significant performance improvement for our model.

We have done some testing of the changes within the debugger and plan to add an automated test. Suggestions of the best way to do this would be welcome...

rdzman · 2019-07-05T22:13:16Z

Thanks @dmuldrew. There are some automated tests in t_opf_model (currently failing one test) that cover at least some of the functionality. I don't remember off-hand how thorough it is.

If we can get a consistent speed-up in this code, that would be wonderful, especially for MOST. Why don't you see if you can track down the bug in params_quad_cost that's causing the failing test and meanwhile I'll try to find some time to wrap my mind around the details. I've done a fair amount of Matlab profiling, but it looks like I'm about to learn some new tricks. 😉

Also, could you remove the modification to scale_load.m since any change there would be handled as a separate issue? Thanks.

dmuldrew · 2019-07-05T22:45:05Z

Sure, I’ll investigate the failing test error and make the requested changes...we’re seeing somewhere around a 80-90% speed up. In case it helps, here’s a shorter post which started my investigation down this path:
https://www.mathworks.com/matlabcentral/answers/203734-most-efficient-way-to-add-multiple-sparse-matrices-in-a-loop-in-matlab

dmuldrew · 2019-07-08T21:48:58Z

I removed the scale_load.m change. I haven't been able to repro the travis-ci error yet... It seems to occur using Octave as a vertcat dimension mismatch error and you also mentioned it's within params_quad_cost.m

rdzman · 2019-07-09T14:40:32Z

t_opf_model runs without any fatal errors, but it fails test 421 ...

ok 420 - om.params_quad_cost('qc', {2, 2}) : k
not ok 421 - om.params_quad_cost() : Q
    index              got             expected      abs(got - exp)
---------------  ----------------  ----------------  ----------------
         (1,1)               171               179                 8
         (4,1)               175               167                 8
         (2,2)                 0                 5                 5
         (3,2)                 3                 0                 3
         (5,2)                 5                 0                 5
         (7,2)                 0                 3                 3
         (2,3)                 2                 0                 2
         (3,3)                 0                 6                 6
         (6,3)                 6                 2                 4
         (1,4)                 5                 4                 1
    .
    .
    .

I'd assumed that the subsequent fatal error in t_opf_default() was caused by the same issue, but it looks like that one is related to params_lin_constraint().

.
.
.
ok 141 - DEFAULT : w/ignored angle difference limits : branch flow
ok 142 - DEFAULT : w/ignored angle difference limits : branch mu
Error using vertcat
Dimensions of matrices being concatenated are not consistent.

Error in opt_model/params_lin_constraint (line 113)
        I = vertcat(sparseInds{:,1});

Error in mipsopf_solver (line 82)
[A, l, u] = om.params_lin_constraint();

Error in opf_execute (line 86)
      [results, success, raw] = mipsopf_solver(om, mpopt);

Error in opf (line 232)
    [results, success, raw] = opf_execute(om, mpopt);

Error in runopf (line 75)
[r, success] = opf(casedata, mpopt);

Error in t_opf_default (line 340)
    r = runopf(mpc, mpopt);

Not sure yet what the issue is, but we should add some tests to t_opf_model() to catch it before fixing it.

rdzman · 2019-07-09T16:28:22Z

Started looking more closely at the code and found both issues. I'll be pushing some updates shortly.

In params_lin_constraints() when Ak has a single row with multiple non-zeros, find() returns row vectors instead of column vectors, hence the vertcat error.
In params_quad_cost the code was not handling the case where Q was specified as a column vector (diagonal elements of a diagonal Q matrix). Also, since c is a dense vector, it's probably not worthwhile to convert it to sparse to use this technique.

dmuldrew · 2019-07-09T21:18:41Z

Great to hear...our simulation doesn't hit these particular edge cases so it would have taken me longer to debug the error. Hopefully you're seeing a speedup with MOST as well?

dmuldrew · 2019-07-09T22:06:46Z

I'm definitely seeing an speed improvement with previous sparse c vector implementation (0.5s vs 14s total time for 1150 function calls). Perhaps we might use the similar non-sparse accumarray function:
https://blogs.mathworks.com/loren/2008/02/20/under-appreciated-accumarray/
https://www.mathworks.com/help/matlab/ref/accumarray.html

Change in params_quad_cost:
subs = cell2mat(c_Inds(:,1:2));
values = cell2mat(c_Inds(:,3));
c = accumarray(subs, values, [nx,1]);

rdzman · 2019-07-10T15:40:09Z

It turns out that I'm seeing much less improvement than I expected. For example, the MOST tests, which are admittedly small, show negligible change, except for t_most_w_ds.

In fact, I went through a few other functions (eval_nln_constraint, eval_nln_constraint_hess, eval_nln_cost, params_legacy_cost) and updated them to use the same technique, hoping for significant improvements for large AC OPF problems. But I found that the result was typically a small slow-down, rather than an improvement.

I suspect the only place where this technique is going to show significant improvement is for certain kinds of MOST cases where the current implementation uses a very large number of sparse additions.

In any case, would it be possible for you to send me privately the MOST case you're running? A .mat file with the mdi and mpopt values would be sufficient. I'd like to test things on a case that shows the kind of speedups you're seeing. I don't have a case handy where the time to set up the problem is significant relative to the time required to solve it.

So far t_most_w_ds is the best I've encountered. On my machine the setup time is cut from 12-13s down to 7-8s, with a solve time is around 12s. I'll check the sparse c vector thing again to see if I can observe the improvements you mention.

dmuldrew · 2019-07-10T21:56:47Z

Yes, between the two implementations there does appear to be a run-time transition point. It appears to be a function of number of sparse matrices to be summed (and maybe density of the matrices?).

Here's a toy model I used to test this:
https://gist.github.com/dmuldrew/3c167b6bf3b598c7b4e5818e6b5ed7d6

I'll ask about the possibility of sharing a mdi or mpopt file.

rdzman · 2019-07-10T22:19:29Z

Interesting! I haven't played with that code yet (and may not get a chance to anytime soon), but I wonder how the plot changes for smaller or larger values of N.

Also, any idea what the corresponding values of numLoops and N would be for your MOST runs?

dmuldrew · 2019-07-10T22:43:20Z

For us, the value of om.lin.NS is ~1300. Individual Ak linear constraint matrices approximated as square matrices that have values N=4300 and density of 0.00023.

Plugging these parameters into the toy model gives:
numLoops: 1300
ConstructSparse: 1.3257 seconds
AddSparse: 43.1034 seconds

The transition point for these Ak is somewhere around numLoops ~40

I'm guessing that the transition point occurs at lower numLoop values when the total # of nonzeros in the Ak matrices increases.

Code analyzer says it's faster.

rdzman · 2019-08-15T21:35:26Z

I confess I had a hard time deciding when to use this new method and when not to. For large numbers of constraint and cost sets with large matrices, it's a clear win. However, when that's not the case, I found the timing to vary a lot.

In the end I went with the new method when the number of constraint or cost sets (number of loops) was greater than 100 (or greater than 25 if the matrix had more than 300 cols). The attached file (pr70.zip) has some data and code I used in the process (saved here for posterity).

I'll be re-basing my changes on the latest master and pushing them here momentarily.

- @opt_model/params_lin_constraint() - @opt_model/params_quad_cost() Should give major boost to setup phase for some MOST runs. Note: Similar refactoring in the following functions does not result in improved performance: - @opt_model/eval_nln_constraint() - @opt_model/eval_nln_constraint_hess() - @opt_model/eval_nln_cost() - @opt_model/params_legacy_cost() Ref: MATPOWER#70 (+8 squashed commits) Squashed commits: [5a0723e] Fix several issues in new implementation of params_quad_cost() - Q specified as column vector was not being handled - varsets were only being used to order columns, but they need to order rows as well - c is kept as dense vector [8f12a17] Fix issue with single-row constraints causing vertcat dimension mismatch error. [2fc302b] Include single row constraint in t_opf_model(). - Uncovers fatal error in new implementation of @opt_model/params_lin_constraint(). [4cffa8d] docs: added more comments to changed sections [7cc64b6] refactor: update params_quad_cost.m with faster sparse matrix summations [41ae817] refactor: update params_lin_constraint.m with faster sparse matrix summations

Ref: MATPOWER#70

dmuldrew · 2019-08-15T23:10:46Z

Glad to see this method incorporated into master and that the performance improvements will help others out:
MATPOWER/most#7

I'm planning to submit a second PR with a few other sparse matrix tweaks we're also using...

lordleoo · 2019-08-16T07:36:45Z

Glad to see this method incorporated into master and that the performance improvements will help others out:
MATPOWER/most#7

I'm planning to submit a second PR with a few other sparse matrix tweaks we're also using...

Thanks a lot; You're a life saver
I look forward to your future PRs. I can use any speed-up tweak I can get.

rdzman mentioned this pull request Aug 9, 2019

Large MOST models (nt > 1000) MATPOWER/most#7

Closed

Use numel() in place of prod(size()).

638761f

Code analyzer says it's faster.

dmuldrew and others added 2 commits August 15, 2019 16:37

Apply new sparse matrix construction selectively.

ff015af

Ref: MATPOWER#70

rdzman force-pushed the RefactorSparseSummation branch from 1318651 to ff015af Compare August 15, 2019 21:39

rdzman merged commit ff015af into MATPOWER:master Aug 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor sparse summation and indexing #70

Refactor sparse summation and indexing #70

dmuldrew commented Jul 3, 2019 •

edited

Loading

rdzman commented Jul 5, 2019

dmuldrew commented Jul 5, 2019 •

edited

Loading

dmuldrew commented Jul 8, 2019

rdzman commented Jul 9, 2019

rdzman commented Jul 9, 2019

dmuldrew commented Jul 9, 2019 •

edited

Loading

dmuldrew commented Jul 9, 2019 •

edited

Loading

rdzman commented Jul 10, 2019

dmuldrew commented Jul 10, 2019 •

edited

Loading

rdzman commented Jul 10, 2019

dmuldrew commented Jul 10, 2019 •

edited

Loading

rdzman commented Aug 15, 2019

dmuldrew commented Aug 15, 2019

lordleoo commented Aug 16, 2019

Refactor sparse summation and indexing #70

Refactor sparse summation and indexing #70

Conversation

dmuldrew commented Jul 3, 2019 • edited Loading

rdzman commented Jul 5, 2019

dmuldrew commented Jul 5, 2019 • edited Loading

dmuldrew commented Jul 8, 2019

rdzman commented Jul 9, 2019

rdzman commented Jul 9, 2019

dmuldrew commented Jul 9, 2019 • edited Loading

dmuldrew commented Jul 9, 2019 • edited Loading

rdzman commented Jul 10, 2019

dmuldrew commented Jul 10, 2019 • edited Loading

rdzman commented Jul 10, 2019

dmuldrew commented Jul 10, 2019 • edited Loading

rdzman commented Aug 15, 2019

dmuldrew commented Aug 15, 2019

lordleoo commented Aug 16, 2019

dmuldrew commented Jul 3, 2019 •

edited

Loading

dmuldrew commented Jul 5, 2019 •

edited

Loading

dmuldrew commented Jul 9, 2019 •

edited

Loading

dmuldrew commented Jul 9, 2019 •

edited

Loading

dmuldrew commented Jul 10, 2019 •

edited

Loading

dmuldrew commented Jul 10, 2019 •

edited

Loading