Replies: 1 comment
-
einops.einsum is merely a facade to AFAIR right now torch.einsum includes opt_einsum and I assume by default optimizes order of execution. If there are any problems in your code with memory allocation, they almost certainly happen in the last einsum. I'd recommend
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am working on implementing SMoE for Mixtral and have found the following bug. When performing
einops.einsum
with multiple tensors at once, the code gives a bug at large batch sizes (w=9*K
and above), giving an error that it attempts to allocate 1008 GiB of memory. This scales linearly, sow=18*K
gives 2016 GiB. However, valuesw=8*K
and below properly execute, even though they "should" be trying to assign equally unreasonable amounts of memory. When I implement the matrix operations separately, the code executes without memory errors, even with very largew=96*K
values.Could the multiple tensor memory / arrangement algorithm be improved to solve this error?
Cheers.
Beta Was this translation helpful? Give feedback.
All reactions