-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add nccl support for multi-gpu tensor parallelism (#91)
* first commit * first commit * add tests mod * first commit * refactor repository * refactor the llm service logic to be able to communicate with the axum service * fmt * message content format and parse RequestBody into GenerateRequest * config comments * minor mods * add unit tests for messages to prompt * improve docs, resolve clippy, add remaining logic to handle responses back to the user * refactor tests * refactor tests for llm * first commit * llama-nccl * add clap args * add features derive to clap * remove comments * resolve few issues with finished reason parsing * address PR comments * add llama models enums * add llama models enums * add llama models enums * correct meta hf string * correct meta hf string * add changes * handle compilation issues * update candle versions * clippy * fix compilation issues * minor changes * add llama_nccl to vllm * add changes for compilation * new changes * adjust engine tests * resolve bug * solve a few issues * add changes * resolve a few minor bugs and adds info logs for cache engine to load times * add nccl feature * update features on server crate * update features dependencies on backends with nccl * add changes * address PR comments * add small changes * add small changes * add small changes * remove unnecessary feature * remove unnecessary feature flags from code * remove unnecessary feature flags from code * add changes * add imports * add imports * add imports * add feature gating to llama tests * only allocate device ids memory * only allocate device ids memory
- Loading branch information
1 parent
986c723
commit 036cc85
Showing
22 changed files
with
1,560 additions
and
214 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.