-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : combined beam search + grammar sampling strategy #2923
Comments
Hi team, have some small experience with beam search and I think that I can help, can I work on this PR @ggerganov |
Sure, let us know if you make any progress and make sure to check the comments in the referenced issue |
Sure @ggerganov , beside that, is there anything I have to notice? I'm using an Apple Silicon for development |
Nothing specific comes to mind. I'll recommend writing a separate C++ example, similar to #2926 with extensive logging so we can debug what is being generated. If you open a draft PR, we can give recommendations during the development, but you don't have to if you prefer it that way. Also, get familiar with the existing |
Hi @ggerganov, to be honest, it's quite hard to start config and debug the project. Can we contact on some channel to discuss about how to start? If it does not existed, it'd like to write that document down to, so it will benefit new contributors. FYI, I know how to code C++, but not many experience on building and shipping C++ project, maybe that's also an issue, too. |
@ggerganov a bunch of these cool thee toys (speculative exec, beam search) seem to be landing in either main or separate executables in examples. Do you intend to push for some consolidation of this functionality all into main at some point? |
Hi! I was thinking about looking ahead one character at a time and, as long as there is exactly one option, accept that character and continue forward. This is to say, give the control back to the model as soon as we need to branch (which tends to happen when filling the values of this specific JSON). I didn't feel like opening another issue, as this one seemed closely related. Also, this has already been tangentially discussed in e.g. #1773 (comment) (Tobias Lütke)
The thing with "the next token is already known" is that some tokens share prefixes, so many of them could be valid simultaneously under some grammar, thus I think it would be better to iterate one character at a time until there's a branching, and just then tokenize and batch decode. Thanks in advance for any thoughts or suggestions! |
One simple workaround is to use the |
Any news on this ? Also what @viantirreau suggested would be top notch. |
Hi @nhhung1810, do you need any assistance with this? I’ve put together a quick prototype using Hugging Face Transformers over the weekend. While it’s been a while since I last worked extensively with C++, I do have a few years of experience and would be happy to help. Additionally, I have some ideas for faster token sampling that could follow once this feature is implemented. Let me know if I can contribute! |
@tom-010 Hi feel free to tackle it : D |
I'l give it a spin next weekend :-) |
This feature was proposed by @spion in #2813 (comment)
It should be possible to implement this by combining the existing beam search and grammar sampling features. See the discussion in the referenced comment for more info
The text was updated successfully, but these errors were encountered: