Diff with StreamingLLM #20
pfeatherstone
started this conversation in
General
Replies: 1 comment 2 replies
-
I dont have experiments to back it up, but my intuition is that RMT is closer to a solution whereas attention sinks is a hack to make standard decoders work. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Meta AI have just released https://arxiv.org/pdf/2309.17453.pdf
This looks related to, or at least trying to solve the same problems as, RMT.
Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions