Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(wip) add Flash decoding #5

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

(wip) add Flash decoding #5

wants to merge 15 commits into from

Conversation

FSSRepo
Copy link

@FSSRepo FSSRepo commented Mar 6, 2024

I need to improve the stability of the kernel since it sometimes tends to generate incorrect values when performing a KV shift and conducting performance tests on A100. This is a flash attention version that parallelizes along the sequence, allowing for improved occupancy on GPUs with many SMs, enhancing inference speed by handling very long contexts.

FSSRepo pushed a commit that referenced this pull request Jun 8, 2024
* main : don't print special tokens with --grammar

The CLI interface was recently changed to print special control tokens
like the </s> stop message one. This token shouldn't be printed if the
grammar flag was passed, unless the grammar specifies it, because that
breaks shell-scriptability.

* main: use seperate stream for control characters

* main: use dprintf and add --ctrl-token-no-out and --ctrl-token-fd-out

* main: dprintf isn't part of the IEEE POSIX standard. Just use write().

* main: remove --ctrl-token-fd-out in favor for fcntl() based detection

* common.cpp: accidentally removed --interactive-first

* main: only merge stdout and control token if not in conversation or grammar mode

* main: rejig control token descriptor handling

* main: must check pipe status on very top of program

* main: renamed --no-special from  --ctrl-token-no-out and other refactoring

* main: refactor ctrl_token_no_out --> no_special

* llama: rename llama_token_is_control_token() to llama_token_is_control()

* main: remove special token file descriptor feature (#5)

---------

Co-authored-by: Brian <mofosyne@gmail.com>
@YixinSong-e
Copy link

Any update? I would love to do more for this work. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants