v0.0.10

awni released this 18 Jan 20:02

· 665 commits to main since this release

f6e911c

Highlights:

Faster matmul: up to 2.5x faster for certain sizes, benchmarks
Fused matmul + addition (for faster linear layers)

Core

Quantization supports sizes other than multiples of 32
Faster GEMM (matmul)
ADMM primitive (fused addition and matmul)
mx.isnan, mx.isinf, isposinf, isneginf
mx.tile
VJPs for scatter_min and scatter_max
Multi output split primitive

NN

Losses: Gaussian negative log-likelihood

Misc

Performance enhancements for graph evaluation with lots of outputs
Default PRNG seed is based on current time instead of 0
Primitive VJP takes output as input. Reduces redundant work without need for simplification
PRNGs default seed based on system time rather than fixed to 0
Format boolean printing in Python style when in Python

Bugfixes

Scatter < 32 bit precision and integer overflow fix
Overflow with mx.eye
Report Metal out of memory issues instead of silent failure
Change mx.round to follow NumPy which rounds to even

Assets 2