Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable running PyTorch models #207
Enable running PyTorch models #207
Changes from 79 commits
12ce0a3
7a84f15
f454b7b
afde741
c49ef45
d9ac72f
acbf825
fef750f
25a567e
3d06f68
3cafc8b
34f77ef
afb4d4f
e7212a5
f27e3b3
2316e37
9b985e8
f2dcc48
4d73e63
15a0d3b
959019d
b6050d9
ff8eb27
8696df5
0af3a70
90ffccd
32686d8
de2631b
618ca62
9ce2f47
c14c0e9
4328440
9fb6358
0c40fe8
7d89811
0bbc41a
08a63ca
1dee091
e098d0b
02b7c1b
5cefe97
c470c36
a4612da
4564bd0
196026c
de68a84
b502654
686780c
bebd7b2
ed46b5e
62918dd
b98bdce
e144517
e6abcc7
dfbf359
04da3bb
5dfecb2
e4bbad9
ee9cdc9
2071749
15a90d0
1f56ee9
90284fa
52ad1ad
61b680e
f128fe6
568583a
762012d
72f3707
3128329
dc5fb6e
ebe0b4e
4b2de70
f09d458
c9ac5ba
f1cf274
eaa53a7
1336fb8
6186ef2
2229324
aa4d477
8bb96ed
cf0813d
f716851
992b1a0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we started considering vllm's tensor layout, what do you think about unifying it? It seems like upstream mlc-llm also uses 2D inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this also could help our cuda graph integration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't verified if 2D inputs is better for performance, and how much cuda graph actually helps.
The upstream input looks like 2D but it is always either
(1, num_total_token)
or(batch_size, 1)
. So their 2D input is essentially 1D.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it is worth visiting imo. But not now, in the future. Although there might not be performance boost, it would be nice to unify the layout with upstream unless there is reason.