Faster speed for compare_survival #215

raynardj · 2021-07-09T04:37:58Z

Checklist

What does this implement/fix? Explain your changes

The change can improve the speed for function sksurv.compare.compare_survival
Use more numpy native syntax/ operation to accelerate the python iteration on a huge cartetian product
The actual scale of improvement depends on how large the n_groups is
Per testing, the function will return same chisq, pvalue, covar matrix, table

sebp

Thanks for your help!

The algorithm looks good to me, I just would use a couple more numpy functions.

sebp · 2021-07-10T08:51:56Z

sksurv/compare.py

@@ -75,6 +75,7 @@ def compare_survival(y, group_indicator, return_stats=False):
    observed = numpy.zeros(n_groups, dtype=numpy.int_)
    expected = numpy.zeros(n_groups, dtype=numpy.float_)
    covar = numpy.zeros((n_groups, n_groups), dtype=numpy.float_)
+    group_eye = numpy.eye(n_groups, dtype=bool)


Since you are only using it for indexing covar, using numpy.diag_indices makes more sense to me.

I didn't know such thing exist, sure, way less memory consumption

sebp · 2021-07-10T08:52:50Z

sksurv/compare.py

-                        covar[g1, g2] -= temp * at_risk[g2] / total_at_risk
+                temp = at_risk * multiplier
+                covar[group_eye] += temp
+                covar -= (temp[:, None] * at_risk[None, :] / total_at_risk)


You don't need the brackets, and temp[:, None] * at_risk[None, :] is the same as numpy.outer(temp, at_risk).

codecov · 2021-07-10T09:14:38Z

Codecov Report

Merging #215 (391c11d) into master (c4fb764) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #215      +/-   ##
==========================================
- Coverage   98.33%   98.33%   -0.01%     
==========================================
  Files          37       37              
  Lines        3127     3126       -1     
  Branches      460      458       -2     
==========================================
- Hits         3075     3074       -1     
  Misses         28       28              
  Partials       24       24

Impacted Files	Coverage Δ
sksurv/compare.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4fb764...391c11d. Read the comment docs.

sebp · 2021-07-11T10:10:07Z

Thanks! Merged in commit ccc6913

Faster speed for compare_survival

d5f069f

sebp requested changes Jul 10, 2021

View reviewed changes

Faster compare_survival with diag_indices & outer

391c11d

raynardj requested a review from sebp July 10, 2021 09:56

sebp added a commit that referenced this pull request Jul 11, 2021

Merge pull request #215 from raynardj/scikit-survival

ccc6913

sebp closed this Jul 11, 2021

raynardj deleted the faster-compare-survival branch July 11, 2021 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster speed for compare_survival #215

Faster speed for compare_survival #215

raynardj commented Jul 9, 2021

sebp left a comment

sebp Jul 10, 2021

raynardj Jul 10, 2021

sebp Jul 10, 2021

codecov bot commented Jul 10, 2021 •

edited

Loading

sebp commented Jul 11, 2021

Faster speed for compare_survival #215

Faster speed for compare_survival #215

Conversation

raynardj commented Jul 9, 2021

sebp left a comment

Choose a reason for hiding this comment

sebp Jul 10, 2021

Choose a reason for hiding this comment

raynardj Jul 10, 2021

Choose a reason for hiding this comment

sebp Jul 10, 2021

Choose a reason for hiding this comment

codecov bot commented Jul 10, 2021 • edited Loading

Codecov Report

sebp commented Jul 11, 2021

codecov bot commented Jul 10, 2021 •

edited

Loading