New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

use thread-local register file for matmul speedups #205

Merged

jart merged 12 commits into Mozilla-Ocho:main from ahgamut:thread-local

Jan 18, 2024

Commits on Jan 16, 2024

attempt thread-local variables for SGEMM

ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for 6eb9303

Browse repository at this point
Copy the full SHA

6eb9303 View commit details

Browse the repository at this point in the history
remove the BK check

ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for 1413c3a

Browse repository at this point
Copy the full SHA

1413c3a View commit details

Browse the repository at this point in the history
fix memory error with sgemm
```
adding back the BK check
```
ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for d33de16

Browse repository at this point
Copy the full SHA

d33de16 View commit details

Browse the repository at this point in the history
separate reading A and B

ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for 8b0228c

Browse repository at this point
Copy the full SHA

8b0228c View commit details

Browse the repository at this point in the history
writeback to C in one go

ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for 184d203

Browse repository at this point
Copy the full SHA

184d203 View commit details

Browse the repository at this point in the history
update matmul_block2d to also use TM/TN

ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for 61946c3

Browse repository at this point
Copy the full SHA

61946c3 View commit details

Browse the repository at this point in the history
use half everywhere for matmul_block2d
```
and some template param tuning
```
ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for e05b434

Browse repository at this point
Copy the full SHA

e05b434 View commit details

Browse the repository at this point in the history
read A into As as column major
```
A is laid out in column major in global mem
```
ahgamut committed Jan 16, 2024
Configuration menu
View commit details

Copy full SHA for a314754

Browse repository at this point
Copy the full SHA

a314754 View commit details

Browse the repository at this point in the history

Commits on Jan 18, 2024

moving Ctype check into compile-time
```
it's now outside the kernel, so we check only once instead of however
many times. However, this bloats binary size.

also changed template parameters a bit
```
ahgamut committed Jan 18, 2024
Configuration menu
View commit details

Copy full SHA for 44cc475

Browse repository at this point
Copy the full SHA

44cc475 View commit details

Browse the repository at this point in the history
separate zeroing out and reading from gmem
```
since we know that As and Bs are laid out one after another in memory
(ie they are basically svals), and the overalls dimension is (BM + BN) *
BK, we just write one "nested" loop that does the zeroing
```
ahgamut committed Jan 18, 2024
Configuration menu
View commit details

Copy full SHA for 13bab2c

Browse repository at this point
Copy the full SHA

13bab2c View commit details

Browse the repository at this point in the history
move sharedmem pointers into __device__ matmul
```
they're not used in the __global__ functions anyway
```
ahgamut committed Jan 18, 2024
Configuration menu
View commit details

Copy full SHA for 494fb8e

Browse repository at this point
Copy the full SHA

494fb8e View commit details

Browse the repository at this point in the history
remove matmul32_block2d
```
it's now a specialization of matmul_block2d
```
ahgamut committed Jan 18, 2024
Configuration menu
View commit details

Copy full SHA for fc8245a

Browse repository at this point
Copy the full SHA

fc8245a View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use thread-local register file for matmul speedups #205

use thread-local register file for matmul speedups #205

Commits on Jan 16, 2024

Commits on Jan 18, 2024