You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* clean up gguf loading. Move model loading to meta.
* remove cpu
* Fix CI and validation scripts (pytorch#154)
* missing device (pytorch#232)
* Use generator args to group all arguments to generator (pytorch#231)
* prompt
* chat_mode, num_samples
* Move more generator args to use dataclass (pytorch#233)
* prompt
* chat_mode, num_samples
* move more args
* more gen args
* update
* args
* undo some changes
* typos
* Minor lint fixes (pytorch#236)
* remove redundancy & remove int4 linear test from ET tests (pytorch#237)
* remove redundancy
* no int4 linear on ET
* small changes
---------
Co-authored-by: Guang Yang <42389959+guangy10@users.noreply.github.com>
Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>
Co-authored-by: Mergen Nachin <mnachin@meta.com>
Hi folks, thanks for the great work.
With #135 merged, vLLM could see benefit from torch.compile backend given compiler-native integration with PagedAttention kernels.
Is there an easy way to see what the latest/nightly MBU is for torch compile on say, H100 / Llama3 70B?
Also interested in cold start compile time
cc @msaroufim
The text was updated successfully, but these errors were encountered: