Investigate Memory and Performance difference using nvfuser
vs torch.compile
executor on Qwen2
#1552
Labels
high priority
memory use
nemo
Issues needed to support NVIDIA NeMo models.
performance
thunderfx
for things that could be applicable to the dynamo+thunder frontend
On internal image
pjnl-20241213
and on H100 -With
("sdpa", "torchcompile_cat", "nvfuser")
-With
("sdpa", "torchcompile")
-We should investigate what is happening leading to the difference in memory and perf.
cc @apaz-cli @tfogal
The text was updated successfully, but these errors were encountered: