Improve computation time of `compute_matrix` for parametric ops #2759

dwierichs · 2022-06-21T13:34:40Z

With the support of parameter broadcasting in #2609 , the compute_matrix method of many parametric operations was updated.
Unfortunately, the resulting methods are sometimes considerably slower than the previous version.
This PR rewrites compute_matrix for a series of these operations, reducing the slowdown and even achieving speedups compared to the versions that didn't support broadcasting. At the same time, the new methods are more compact.

One key change is heavy usage of qml.math.einsum instead of nested qml.math.stack calls. This reduces the calls to autoray, is easy to write in a broadcastable manner, and results in shorter code.

qml.math.einsum("...,i->...i", x, y) # scalar*vector or outer product between vectors
qml.math.einsum("...i,il->...il", x, y) # broadcastable version of [_x * _y for _x, _y in zip(x, y)]

codecov · 2022-06-21T13:37:36Z

Codecov Report

Merging #2759 (0c49888) into master (4837dd7) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master    #2759    +/-   ##
========================================
  Coverage   99.63%   99.63%            
========================================
  Files         252      252            
  Lines       20772    20986   +214     
========================================
+ Hits        20696    20910   +214     
  Misses         76       76

Impacted Files	Coverage Δ
pennylane/ops/qubit/attributes.py	`100.00% <ø> (ø)`
pennylane/_qubit_device.py	`99.52% <100.00%> (+0.06%)`	⬆️
pennylane/devices/default_mixed.py	`100.00% <100.00%> (ø)`
pennylane/devices/default_qubit.py	`100.00% <100.00%> (ø)`
pennylane/devices/default_qubit_autograd.py	`100.00% <100.00%> (ø)`
pennylane/devices/default_qubit_jax.py	`100.00% <100.00%> (ø)`
pennylane/devices/default_qubit_tf.py	`97.16% <100.00%> (+1.85%)`	⬆️
pennylane/devices/default_qubit_torch.py	`92.45% <100.00%> (+0.14%)`	⬆️
pennylane/ops/qubit/parametric_ops.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8bf6a2...0c49888. Read the comment docs.

albi3ro · 2022-06-21T18:53:02Z

@dwierichs One important question I have about this that will effect performance optimization choices: How often will parameter-broadcasting be used by a standard user? Do we want to make everything a little bit slower so we can make parameter-broadcasting faster? Or should we optimize the standard use-case at the expense of parameter-broadcasting?

I'm also a bit concerned about the readability, and thus maintainability, of this code. These operations are highly trafficked by external contributors and new developers. If we do use non-standard behaviour, like the einsum broadcasting, that needs to be thoroughly explained as close to the code as possible.

mlxd · 2022-06-21T19:12:13Z

Hi @dwierichs . I agree with Christina's points, and wonder if we can instead of using einsum with the ellipsis to directly access broadcasting, we can directly make the broadcasting calls ourselves, removing the dependence on einsum altogether.

Einsum's syntax is a sizeable maintenance burden in my eyes, and if we can do things with direct calls instead I think we should

dwierichs · 2022-06-21T19:29:05Z

Thanks for the feedback, @albi3ro and @mlxd ! :)

Do we want to make everything a little bit slower so we can make parameter-broadcasting faster?

As far as I am aware, master (and v0.24.0) currently already carries the slowdown from allowing broadcasting, introduced by #2609. This PR basically speeds up all scenarios, as far as my timings went (although I have to admit that for some reason standard timing with process_time() and lots of iterations was surprisingly unstable...)

If we do use non-standard behaviour, like the einsum broadcasting, that needs to be thoroughly explained as close to the code as possible.

That's an excellent point :) Maybe one could, as there basically are only the two signatures "...,i->...i" and "...i,ik->ik" used, make them utility functions within parametric_ops that carry extensive docstrings?

if we can instead of using einsum with the ellipsis to directly access broadcasting, we can directly make the broadcasting calls ourselves, removing the dependence on einsum altogether.

I am not entirely sure I understand the alternative here? Would you manually differentiate between broadcasted and not broadcasted within the method, and have an if-else-loop for the distinction? :)

Einsum's syntax is a sizeable maintenance burden in my eyes, and if we can do things with direct calls instead I think we should

Thanks again for the feedback, I wasn't aware of this :)

mlxd · 2022-06-21T20:53:34Z

I am not entirely sure I understand the alternative here? Would you manually differentiate between broadcasted and not broadcasted within the method, and have an if-else-loop for the distinction? :)

I guess what I was trying to map out was that instead of using the native broadcasting as part of einsum (enabled by the ellipsis ops according to their docs), since we already know the structure of the gates we want, why not calls the ops we want directly.

A poor example, but it does demonstrate speed-up over einsum for me:

import pennylane as qml
from timeit import default_timer as timer

theta = qml.numpy.random.rand(20)

theta_scale = 0.5j * theta
pm = qml.numpy.array([-1, 1])

start = timer()
for i in range(10000):
    qml.math.einsum("...,i->...i", theta_scale, pm)
end = timer()
print(f"EINSUM BCAST:t={end-start},res={qml.math.einsum('...,i->...i', theta_scale, pm)}")

start = timer()
for i in range(10000):
    qml.numpy.outer(theta_scale, pm)
end = timer()
print(f"OPS DIRECT:t={end-start},res={qml.numpy.outer(theta_scale, pm)}")

We obtain the same data overall, but the second option is about 2/3 the runtime of the first for me. What do you think?

dwierichs · 2022-06-21T21:13:48Z

We obtain the same data overall, but the second option is about 2/3 the runtime of the first for me. What do you think?

Ah yes, this is indeed significantly faster.
So if we want a version that supports both, theta being scalar and theta being 1d, what would you propose? Manual check which one it is (via qml.math.ndim I suppose) and using either np.outer or simple multiplication for the two cases, respectively?

I noticed that outer works with inputs scalar, tensor (but produces an excess dimension) when using the numpy interface, but it does not work for example with torch:

import pennylane as qml
from timeit import default_timer as timer

theta = qml.numpy.random.rand(20)
theta = torch.tensor(0.514)

theta_scale = 0.5j * theta
pm = qml.numpy.array([-1, 1])
pm = torch.tensor(pm)

start = timer()
for i in range(10000):
    qml.math.einsum("...,i->...i", theta_scale, pm)
end = timer()
print(f"EINSUM BCAST:t={end-start},res={qml.math.einsum('...,i->...i', theta_scale, pm)}")

start = timer()
for i in range(10000):
    qml.math.outer(theta_scale, pm) # <-- raises RuntimeError
end = timer()
print(f"OPS DIRECT:t={end-start},res={qml.numpy.outer(theta_scale, pm)}")

edit It seems that doing this manual if-else loop is indeed significantly faster, still :) Just want to make sure I'm getting your idea correctly!

mlxd · 2022-06-21T21:32:21Z

edit It seems that doing this manual if-else loop is indeed significantly faster, still :) Just want to make sure I'm getting your idea correctly!

Good to know that helps. I guess my point overall is I am not sure if using einsum makes sense overall, as we should know the internal structure of the gates we are creating already, and should not need to offload to einsum to generate the values --- it is just a fancy wrapper around a dispatcher to more fundamental linalg ops & kernels (with some permutation steps and reordering too). As an example, if the ...,i->...i calling convention simply maps to an outer product function call, why not just call the outer product? The same for other uses of the library.

From the Numpy docs on it, the general syntax uses maps to other in-builts

(and I presume TF, JAX, Torch do similar).

Now, if we cannot replicate this functionality to work across all the frameworks, but using the einsum calling convention allows us to do so in a portable way that isn't causing overheads, then I think it becomes necessary to support this.

Though, maybe I am missing the bigger picture overall on why to use this.

dwierichs · 2022-06-21T21:40:02Z

as we should know the internal structure of the gates we are creating already,

As we want to support both broadcasted and scalar parameters, we do not know the internal structure at this point. There always are two scenarios, either producing a (2**num_wires, 2**num_wires) matrix or a (batch_size, 2**num_wires, 2**num_wires) tensor. If I'm getting our conversation right, this is the point at which we think differently? :)

Now, if we cannot replicate this functionality to work across all the frameworks, but using the einsum calling convention allows us to do so in a portable way that isn't causing overheads, then I think it becomes necessary to support this.

When I opened this PR, I thought that

the only way covering both scenarios (broadcasted and scalar) in a all-framework-compatible way is einsum
manually distinguishing between the two scenarios might be too cumbersome/ slow

But seeing now that manually deciding for theta * pm or qml.math.outer(theta, pm) actually is much faster, I agree that we should not use einsum. :)

mlxd · 2022-06-21T21:47:20Z

As we want to support both broadcasted and scalar parameters, we do not know the internal structure at this point. There always are two scenarios, either producing a (2**num_wires, 2**num_wires) matrix or a (batch_size, 2**num_wires, 2**num_wires) tensor. If I'm getting our conversation right, this is the point at which we think differently? :)

Great, thanks for the clarification here!

But seeing now that manually deciding for theta * pm or qml.math.outer(theta, pm) actually is much faster, I agree that we should not use einsum. :)

Yep, sounds like the better choice to me. I still think there are scenarios where einsum is the best fit, but for simplified tensor-ops like we are doing here, I reckon it may be a bazooka to hunt a fly. 🙂

pennylane/ops/qubit/parametric_ops.py

AmintorDusko · 2022-06-27T17:54:40Z

Hi @dwierichs, thank you for putting this PR together.
I made some suggestions to the RZ gate that may improve performance overall if propagated to all gates.
Please double-check if with these changes the edge cases are well covered (like TensorFlow and Torch parameters).

dwierichs

Hi @AmintorDusko ,
Thank you for your review and very helpful suggestions :)
And sorry for the extended silence on this PR.
I left some comments on the RZ code you discussed, I think it's a great idea to do this for one operation in general and carry over the fastest method to other operations (as long as applicable of course :))
Let me know what you think about the update :)

pennylane/ops/qubit/parametric_ops.py

AmintorDusko · 2022-07-11T11:16:44Z

Hi @dwierichs, thank you for testing the compatibility of my suggestions with other frameworks and adapting it!
I'm happy it helped to improve performance.

dwierichs · 2022-07-13T13:35:24Z

Hi @mlxd and @albi3ro
I think this would now be in a review-ready state.
Amintor and I converged on a fastest method for RZ, which I then transferred to other diagonal operations.
Happy to hear any thoughts you might have on the current version, now that the code changed multiple times :)

mlxd · 2022-07-13T15:22:22Z

Hi David, I have do some additional runtime profiling of the new ops compared against 0.24.0, where I ran running compute_matrix for 1e5 samples per gate and calculated the mean runtime.

For reference, this script is used to get the runtime for each op

import pennylane as qml
from timeit import default_timer as timer

p = qml.numpy.random.random(100)
num_samples = 10000
times = {}

def run_op(op):
    # Pre-run
    op.compute_matrix(p)
    start = timer()
    for i in range(num_samples):
        op.compute_matrix(p)
    end = timer()
    return (end-start)/num_samples

ops = [qml.RX, qml.RY, qml.RZ, qml.CRX, qml.CRY, qml.CRZ, qml.PhaseShift, qml.ControlledPhaseShift]

for op in ops:
    times[op.__name__] = run_op(op)

print({key: f"{t:.3e}" for key,t in times.items()})

For single (aka non-broadcasted params), we get the following:

# This PR
{'RX': '2.234e-04', 'RY': '2.385e-04', 'RZ': '9.266e-05', 'CRX': '3.635e-04', 'CRY': '3.521e-04', 'CRZ': '9.233e-05', 'PhaseShift': '9.596e-05', 'ControlledPhaseShift': '9.349e-05'}

# v0.24.0
{'RX': '2.039e-04', 'RY': '2.139e-04', 'RZ': '1.906e-04', 'CRX': '3.672e-04', 'CRY': '3.606e-04', 'CRZ': '3.138e-04', 'PhaseShift': '1.768e-04', 'ControlledPhaseShift': '1.154e-04'}

and for broadcasting with 100 params:

# This PR
{'RX': '2.332e-04', 'RY': '2.457e-04', 'RZ': '8.272e-05', 'CRX': '3.787e-04', 'CRY': '3.755e-04', 'CRZ': '9.163e-05', 'PhaseShift': '8.159e-05', 'ControlledPhaseShift': '9.045e-05'}

# v0.24.0
{'RX': '2.239e-04', 'RY': '2.445e-04', 'RZ': '1.929e-04', 'CRX': '3.474e-04', 'CRY': '3.589e-04', 'CRZ': '3.543e-04', 'PhaseShift': '1.875e-04', 'ControlledPhaseShift': '3.025e-04'}

While we get a speedup in some of the ops (especially the RZ), we see ~10% slow-down on RX and RY pretty much consistently across the board. While the slow-down isn't huge (on the order of ~10ms, I think this isn't an ideal scenario to have, especially for larger systems making calls to these ops.

dwierichs · 2022-07-13T16:17:52Z

Hey Lee,
thanks for taking the time(s)! :)
I ran your code, against master instead of 0.24.0

## not broadcasted
# on master
{'RX': '4.312e-05', 'RY': '4.020e-05', 'RZ': '6.145e-05', 'CRX': '9.097e-05', 'CRY': '9.188e-05', 'CRZ': '9.558e-05', 'PhaseShift': '5.222e-05', 'ControlledPhaseShift': '2.979e-05'}

# this PR
{'RX': '4.619e-05', 'RY': '4.198e-05', 'RZ': '1.955e-05', 'CRX': '9.278e-05', 'CRY': '8.917e-05', 'CRZ': '2.197e-05', 'PhaseShift': '1.935e-05', 'ControlledPhaseShift': '1.924e-05'}

## broadcasted
# on master
{'RX': '1.890e-04', 'RY': '1.915e-04', 'RZ': '1.896e-04', 'CRX': '3.114e-04', 'CRY': '3.227e-04', 'CRZ': '2.974e-04', 'PhaseShift': '1.576e-04', 'ControlledPhaseShift': '2.744e-04'}

# this PR
{'RX': '2.012e-04', 'RY': '2.038e-04', 'RZ': '7.063e-05', 'CRX': '3.341e-04', 'CRY': '3.455e-04', 'CRZ': '7.834e-05', 'PhaseShift': '7.283e-05', 'ControlledPhaseShift': '8.079e-05'}

Now you might ask why I'm posting this. It is because this PR does not actually touch the code of RX, RY, CRX or CRY as far as I am aware, but only of diagonal operations.
I'm therefore a bit confused by the kind of consistent slowdown values for these operations (CRY without broadcasting is the only exception in these times)
that hypothetically should not be affected at all, compared to master 🤔

mlxd · 2022-07-13T16:25:26Z

Hey Lee, thanks for taking the time(s)! :) I ran your code, against master instead of 0.24.0
## not broadcasted
# on master
{'RX': '4.312e-05', 'RY': '4.020e-05', 'RZ': '6.145e-05', 'CRX': '9.097e-05', 'CRY': '9.188e-05', 'CRZ': '9.558e-05', 'PhaseShift': '5.222e-05', 'ControlledPhaseShift': '2.979e-05'}

# this PR
{'RX': '4.619e-05', 'RY': '4.198e-05', 'RZ': '1.955e-05', 'CRX': '9.278e-05', 'CRY': '8.917e-05', 'CRZ': '2.197e-05', 'PhaseShift': '1.935e-05', 'ControlledPhaseShift': '1.924e-05'}

## broadcasted
# on master
{'RX': '1.890e-04', 'RY': '1.915e-04', 'RZ': '1.896e-04', 'CRX': '3.114e-04', 'CRY': '3.227e-04', 'CRZ': '2.974e-04', 'PhaseShift': '1.576e-04', 'ControlledPhaseShift': '2.744e-04'}

# this PR
{'RX': '2.012e-04', 'RY': '2.038e-04', 'RZ': '7.063e-05', 'CRX': '3.341e-04', 'CRY': '3.455e-04', 'CRZ': '7.834e-05', 'PhaseShift': '7.283e-05', 'ControlledPhaseShift': '8.079e-05'}
Now you might ask why I'm posting this. It is because this PR does not actually touch the code of RX, RY, CRX or CRY as far as I am aware, but only of diagonal operations. I'm therefore a bit confused by the kind of consistent slowdown values for these operations (CRY without broadcasting is the only exception in these times) that hypothetically should not be affected at all, compared to master thinking

Hmmm, this is an odd one indeed so. I think we need to do some call-stack probing for this. Let me try this out and see if I can find some differences.

mlxd · 2022-07-13T17:54:54Z

I've tried to inspect cache usage by the implementations, and for whatever reason, I am seeing more L2 cache misses locally with this PR than 0.24 or master. However, this could be an artifact of something else elsewhere, and seems to even out in the large sample limit (1e5), at least for me. I wonder if there is some prefetching happening at the numpy level (C) level that is affecting this, or maybe something foolish like background processes for us both. All-in-all, I think this is worth ignoring, as it does not pose a major risk, provided we can accept the potential of a swing of up to 10% for these ops, which I am assuming will be a small fraction of actual calls in a given workflow.

If anybody else has any thought or comments on this, I'd welcome their suggestions.

I would like to see some timing runs performed of the test-suite for this PR against master/0.24.0 for the affected operations, but I think there will be no other blockers here from me.

tests/ops/qubit/test_attributes.py

AmintorDusko

Nice work, @dwierichs!

dwierichs · 2022-07-15T08:21:20Z

Thanks all!
I will wait for @albi3ro to have a chance to look at this and merge afterwards :)

albi3ro

The outer product is also much more readable than einsum, so that's a double boost to using it.

I also like having the if qml.math.ndim(theta) == 0: statements for readability, as those cases are often simpler code. That allows us to understand the non-broadcasting behaviour more easily.

Good improvement, and thanks performance team!

* introduce Operator.ndim_params, Operator.batch_size, QuantumTape.batch_size * linting * changelog * enable tf.function input_signature usage * black * test for unsilenced error * Apply suggestions from code review Co-authored-by: Josh Izaac <josh146@gmail.com> * introduce device flag and batch_transform for unbroadcasting; use transform in device.batch_transform * black, [skip ci] * code review * string formatting [skip ci] * operation broadcasting interface tests * unbroadcast_expand * tests for expand function * tests * black * compatibility with TensorFlow 2.6 * builtins unstack * failing case coverage * stop using I in operation.py [skip ci] * commit old changes * Apply suggestions from code review Co-authored-by: Josh Izaac <josh146@gmail.com> * intermed * review * Apply suggestions from code review Co-authored-by: Josh Izaac <josh146@gmail.com> * review [skip ci] * move changelog section from "improvements" to "new features" * changelog * add missing files * clean up, move broadcast dimension first * namespace * linting variable names * update tests that manually set ndim_params for default ops * pin protobuf<4.21.0 * improve shape coersion order * first features * fix sample selection * pin protobuf<4.21.0 * changelog formatting * more testing, better batch_size * fix tf function batch_size * broadcasted pow tests * attribute test, ControlledQubitUnitary update * more tf support * test kwargs attributes * Apply suggestions from code review Co-authored-by: Josh Izaac <josh146@gmail.com> * changelog * review * remove prints * explicit attribute supports_broadcasting tests * tests disentangle * fix * PauliRot broadcasted identity compatible with TF * rename "batched" into "broadcasted" for uniform namespace * old TF version support in qubitunitary unitarity check * python3.7 support * extend support, flip flag * fix _apply_unitary when applying unbroadcasted operator to broadcasted state * tests default_qubit * speed up _get_batch_size by a factor 10 * [skip ci] ndim adaptation, fix dimension in _get_bathc_size call * default qubit additional broadcasting support * default_qubit broadcasted tests start * most of test suite * black * complete default qubit tests (without interfaces) * tmp autograd done, jax partially * tf tests * testing * black * linting * tests * black * tf fix * minor changes * use get_batch_size * some TF tests * tf test finished * torch tests * changelog * move _get_batch_size to DefaultQubit, give QubitDevice dummy _get_batch_size, tests * linting, old tf version support * test coverage * test fix * mark jax test * self- review * Apply suggestions from code review * typo * changelog example * speedup par ops * qchem matrices * parametric ops eigvals * linting * lint * changelog * id addition * isingxy * isingxy tests * black * mention speedup in changelog * singleexcitation fix * add ndim_params properties and add qchem ops to attribute * attribute and test * Revert "merge" This reverts commit 55cb10d, reversing changes made to 1cac187. * mask lookups * black * remove unnecessary eigvals * remove speedups to put them in separate PR * tmp * converged matrix methods, priority for scalar speedups * black * lint * exclude isingxy changes (duplicate from #2759) * revert generator changes * dtype namespace * unfying functions for qchem compute_matrix methods in Single- and DoubleExcitations * black Co-authored-by: Josh Izaac <josh146@gmail.com>

separated out speedup PR

7041223

dwierichs mentioned this pull request Jun 21, 2022

Qchem-ops broadcasting support #2726

Merged

revert IsingXY change

9372393

mlxd requested review from AmintorDusko and mlxd June 21, 2022 14:38

albi3ro requested review from albi3ro June 21, 2022 18:44

dwierichs added 4 commits June 22, 2022 01:31

alternative version

fd55ba4

black

a5e3cad

more working stuff?

fcc579d

merge

dcd155a

AmintorDusko reviewed Jun 27, 2022

View reviewed changes

pennylane/ops/qubit/parametric_ops.py Outdated Show resolved Hide resolved

AmintorDusko reviewed Jun 27, 2022

View reviewed changes

pennylane/ops/qubit/parametric_ops.py Outdated Show resolved Hide resolved

AmintorDusko reviewed Jun 27, 2022

View reviewed changes

pennylane/ops/qubit/parametric_ops.py Outdated Show resolved Hide resolved

dwierichs commented Jul 7, 2022

View reviewed changes

pennylane/ops/qubit/parametric_ops.py Outdated Show resolved Hide resolved

pennylane/ops/qubit/parametric_ops.py Outdated Show resolved Hide resolved

pennylane/ops/qubit/parametric_ops.py Outdated Show resolved Hide resolved

dwierichs added 4 commits July 7, 2022 11:58

tmp

9fcfd09

converged RZ.compute_matrix

a47412c

converged compute_eigvals

f1b1306

tests

ca75af9

dwierichs added 6 commits July 13, 2022 12:38

converged matrices and eigvals for diagonal operations

836932c

Merge branch 'master' into speedup-compute_matrix

d7bc5ae

compatibility

a161a5c

merge

aceb1b7

more merge

ca3bca5

test coverage

28de4c0

dwierichs added 3 commits July 14, 2022 08:53

Merge branch 'master' into speedup-compute_matrix

c78e36a

isingxy registration

4a63f79

test attrs

0c3326a

dwierichs commented Jul 14, 2022

View reviewed changes

tests/ops/qubit/test_attributes.py Show resolved Hide resolved

mlxd approved these changes Jul 14, 2022

View reviewed changes

AmintorDusko approved these changes Jul 14, 2022

View reviewed changes

dwierichs added a commit that referenced this pull request Jul 18, 2022

exclude isingxy changes (duplicate from #2759)

5d16e00

dwierichs added 3 commits July 18, 2022 14:24

isingxy: additional tests and compatibility of compute_eigvals

4436a9f

Merge branch 'master' into speedup-compute_matrix

741cff5

Merge branch 'master' into speedup-compute_matrix

cc6202b

albi3ro approved these changes Jul 27, 2022

View reviewed changes

dwierichs added 2 commits July 27, 2022 17:31

Merge branch 'master' into speedup-compute_matrix

920d1d8

fix U1 matrix

0c49888

dwierichs merged commit 465642b into master Jul 27, 2022

dwierichs deleted the speedup-compute_matrix branch July 27, 2022 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve computation time of `compute_matrix` for parametric ops #2759

Improve computation time of `compute_matrix` for parametric ops #2759

dwierichs commented Jun 21, 2022

codecov bot commented Jun 21, 2022 •

edited

Loading

albi3ro commented Jun 21, 2022

mlxd commented Jun 21, 2022

dwierichs commented Jun 21, 2022

mlxd commented Jun 21, 2022 •

edited

Loading

dwierichs commented Jun 21, 2022 •

edited

Loading

mlxd commented Jun 21, 2022 •

edited

Loading

dwierichs commented Jun 21, 2022 •

edited

Loading

mlxd commented Jun 21, 2022

AmintorDusko commented Jun 27, 2022 •

edited

Loading

dwierichs left a comment

AmintorDusko commented Jul 11, 2022

dwierichs commented Jul 13, 2022

mlxd commented Jul 13, 2022

dwierichs commented Jul 13, 2022

mlxd commented Jul 13, 2022

mlxd commented Jul 13, 2022

AmintorDusko left a comment

dwierichs commented Jul 15, 2022

albi3ro left a comment

Improve computation time of compute_matrix for parametric ops #2759

Improve computation time of compute_matrix for parametric ops #2759

Conversation

dwierichs commented Jun 21, 2022

codecov bot commented Jun 21, 2022 • edited Loading

Codecov Report

albi3ro commented Jun 21, 2022

mlxd commented Jun 21, 2022

dwierichs commented Jun 21, 2022

mlxd commented Jun 21, 2022 • edited Loading

dwierichs commented Jun 21, 2022 • edited Loading

mlxd commented Jun 21, 2022 • edited Loading

dwierichs commented Jun 21, 2022 • edited Loading

mlxd commented Jun 21, 2022

AmintorDusko commented Jun 27, 2022 • edited Loading

dwierichs left a comment

Choose a reason for hiding this comment

AmintorDusko commented Jul 11, 2022

dwierichs commented Jul 13, 2022

mlxd commented Jul 13, 2022

dwierichs commented Jul 13, 2022

mlxd commented Jul 13, 2022

mlxd commented Jul 13, 2022

AmintorDusko left a comment

Choose a reason for hiding this comment

dwierichs commented Jul 15, 2022

albi3ro left a comment

Choose a reason for hiding this comment

Improve computation time of `compute_matrix` for parametric ops #2759

Improve computation time of `compute_matrix` for parametric ops #2759

codecov bot commented Jun 21, 2022 •

edited

Loading

mlxd commented Jun 21, 2022 •

edited

Loading

dwierichs commented Jun 21, 2022 •

edited

Loading

mlxd commented Jun 21, 2022 •

edited

Loading

dwierichs commented Jun 21, 2022 •

edited

Loading

AmintorDusko commented Jun 27, 2022 •

edited

Loading